Move data modeling upstream

We will see a lot of data modeling move upstream away from batch modeling in a cloud data warehouse (Modern Data Stack) to the producer continuously generating domain events instead. Why?

“Source-aligned business events are not modeled or structured like the source application’s transactional database; an anti-pattern is often observed, particularly when events are sourced through Change Data Capture tooling or Data Virtualization on top of the application’s database.”

Zhamak Dehghani describes this very well in the fourth chapter of Data Mesh (great read, recommend it) and it resonates very well with me.

And there is a great recording from a webinar with Zhamak

IMO, this transition will benefit more operational use cases, especially streaming analytics, since you model once in the source instead of multiple times at the destinations. Another effect I think we will see is wide and nested tables replacing star schemas in the DWH to better represent the domain events and this also enables (1) better query performance, (2) a higher degree of automation and (3) schema evolution.

P.S. you can publish domain events with CDC + outbox pattern to address the dual write challenge.