Serverless dbt on Google Cloud Platform

Data Build Tool (dbt) is an awesome open source tool founded by dbt labs that also offers a managed service to work with dbt. But you can also host dbt yourself and do that all serverless with a collaborative setup following GitOps practices, best of all - it is easy to set up and very cost efficient. Managed service or self-hosted? Before jumping onto the setup you should ask yourself what option suit you the best. »

Move data modeling upstream

We will see a lot of data modeling move upstream away from batch modeling in a cloud data warehouse (Modern Data Stack) to the producer continuously generating domain events instead. Why? “Source-aligned business events are not modeled or structured like the source application’s transactional database; an anti-pattern is often observed, particularly when events are sourced through Change Data Capture tooling or Data Virtualization on top of the application’s database.” Zhamak Dehghani describes this very well in the fourth chapter of Data Mesh (great read, recommend it) and it resonates very well with me. »

Is data mesh only for large organisations?

Is data mesh only for large organisations? Many data mesh authorities argue that is the case. But I disagree, it isn’t primarily about company size at all, in fact a data mesh can be even more suitable in a scale up than an enterprise. Why? IMO the maturity for a data mesh is rather based on: 1. Pace of change in the analytical system. 2. Degree of decentralization of the operational system. »

Two data platform KPIs

If I had to choose two Key Performance Indicators for a data platform team: - Time to market for new datasets/pipelines - Data Downtime (periods of time when data is inaccurate) Why? - Producing side: Scaling for data volumes is solved, now scaling is about adding new datasets/pipelines, especially as feature teams will take on data ownership and micro services will deliver data as a first class deliverable in addition to the service API. »

Streaming Analytics affect both tracking and analytics!

Streaming Analytics will change the way you should think about both tracking and analytics in digital analytics. How? Tracking: Since tracking has been focused on decision making, it has mainly captured performance metrics rather than signals that can be used to personalize the user experience. That has to shift with the advent of streaming analytics. Analytics: In batch analytics you let the query run over your data, but in streaming analytics you let your data run over your query. »