Configure Firebase Analytics and Google Analytics app + web Bigquery export to EU region

January 2019 when I tried to set up BigQuery export on our Firebase Analytics projects I found out that I couldn’t chose region for the export and that it defaults to US. Since my employer is an European comapny I prefer to store the data in EU. This is the exact same issue that I had previously with GA360 BigQuery Export and I thought I would try to solve it in a similar manner (that has become part of the the GA360 BigQuery Export documentation for how to geolocate your data in EU)....

August 14, 2019 · 2 min · Robert Sahlin

Fast and flexible data pipelines with protobuf schema registry

MatHem is growing quickly and so are the requirements for fast and reliable data pipelines. Since I joined the company a little more than one year ago I’ve been developing an event streaming platform (named DataHem) to meet those requirements. 1 Background Before jumping into the solution architecture, I thought I would give you some background from a business perspective that has influenced the design choices. 1.1 Context MatHem is the biggest online grocery store in Sweden and to briefly give a context this is how the business works:...

May 31, 2019 · 8 min · Robert Sahlin

Bigquery efficient access management

A strategic decision we’ve made at MatHem is to enable users to connect to or data warehous (BigQuery) with whatever tool (tableau, data studio, collab, etc.) they prefer and still be certain that they only can access data that they have permission to. That turned out to be a challenge in BigQuery with the current access management capabilities, since you give users/roles (or authorized views) access on the dataset-level and not views/table-level....

February 14, 2019 · 2 min · Robert Sahlin

Bigquery Training Resources for Digital Analysts test

In this post I’v

July 15, 2018 · 1 min · Robert Sahlin

DataHem: open source, serverless, real-time and end-2-end ML pipeline on Google Cloud Platform

I’m excited to say that the project I’ve been working on the last year is now released as OpenSource (MIT license). DataHem is a serverless real-time end-2-end ML pipeline built entirely on GoogleCloud Platform services - AppEngine, PubSub, Dataflow, BigQuery, Cloud ML Engine, Deployment Manager, Cloud Build and Cloud Composer. When building ML/Data products, your most valuable asset is your data. Hence, the purpose of DataHem is to give you:...

June 1, 2018 · 2 min · Robert Sahlin