Serverless dbt on Google Cloud Platform

Data Build Tool (dbt) is an awesome open source tool founded by dbt labs that also offers a managed service to work with dbt. But you can also host dbt yourself and do that all serverless with a collaborative setup following GitOps practices, best of all - it is easy to set up and very cost efficient. Managed service or self-hosted? Before jumping onto the setup you should ask yourself what option suit you the best....

November 25, 2021 · 9 min · Robert Sahlin

Streaming Analytics affect both tracking and analytics!

Streaming Analytics will change the way you should think about both tracking and analytics in digital analytics. How? Tracking: Since tracking has been focused on decision making, it has mainly captured performance metrics rather than signals that can be used to personalize the user experience. That has to shift with the advent of streaming analytics. Analytics: In batch analytics you let the query run over your data, but in streaming analytics you let your data run over your query....

July 29, 2021 · 2 min · Robert Sahlin

Inmon vs Kimball vs Data Vault vs Wide tables

Inmon vs Kimball vs Data Vault? Personally I prefer wide, nested and denormalized tables as data warehouse architecture. Why? Cloud Data warehouses are designed as distributed systems with columnar storage that is separated from compute. Hence, you can efficiently query specific fields over a huge amount of records but you want to avoid joins as it introduces overhead when shuffling data between compute instances. Also, I prefer to keep my data immutable and if something is wrong I rather replay the data with the new logic than performing mutations on existing data....

July 29, 2021 · 2 min · Robert Sahlin

Validate and monitor your BigQuery data

Data observability has gained huge momentum and data quality is essential for any kind of analytical system no matter it is plain old reporting or advanced machine learning. I’ve seen reports that states that data engineers spend more than 30% of their time manually chasing data quality issues! That is not only cost in term of precious resources’s time but also missed opportunities or even worse - loss in trust of your data and your data team....

February 6, 2021 · 3 min · Robert Sahlin

Automatic builds and version control of your BigQuery views

We (MatHem) has finally moved our BigQuery view definitions to GitHub and automized builds so that whenever someone in the data team modify/add a view definition and push/merge that to the master or develop branch it triggers a build of our views in our production/test environment respectively. Hence we get version control and always are in sync between the view definition and the views deployed in BigQuery. Below are two ways to set it up and requires a github repo, cloud build and bigquery....

February 19, 2020 · 4 min · Robert Sahlin