Fast and flexible data pipelines with protobuf schema registry

MatHem is growing quickly and so are the requirements for fast and reliable data pipelines. Since I joined the company a little more than one year ago I’ve been developing an event streaming platform (named DataHem) to meet those requirements. 1 Background Before jumping into the solution architecture, I thought I would give you some background from a business perspective that has influenced the design choices. 1.1 Context MatHem is the biggest online grocery store in Sweden and to briefly give a context this is how the business works: »

Bigquery efficient access management

A strategic decision we’ve made at MatHem is to enable users to connect to or data warehous (BigQuery) with whatever tool (tableau, data studio, collab, etc.) they prefer and still be certain that they only can access data that they have permission to. That turned out to be a challenge in BigQuery with the current access management capabilities, since you give users/roles (or authorized views) access on the dataset-level and not views/table-level. »

DataHem: open source, serverless, real-time and end-2-end ML pipeline on Google Cloud Platform

I’m excited to say that the project I’ve been working on the last year is now released as OpenSource (MIT license). DataHem is a serverless real-time end-2-end ML pipeline built entirely on GoogleCloud Platform services - AppEngine, PubSub, Dataflow, BigQuery, Cloud ML Engine, Deployment Manager, Cloud Build and Cloud Composer. When building ML/Data products, your most valuable asset is your data. Hence, the purpose of DataHem is to give you: - Full control and ownership of your data and data platform - Unsampled data - Data in real time - The ability to replay/reprocess your data unlimited times - Data synergies, i. »

Bigquery Training Resources for Digital Analysts

In this post I’ve tried to collect different training resources that I’ve found useful for myself, some for free and some for a fee. The focus is using BigQuery for digital analytics. If you are one of the lucky digital analysts who work for an organisation with the 360 version of Google Analytics or Firebase Blaze, but not started using BigQuery? Then, don’t wait for it, enable the BigQuery Export (read this post if you are acting in EU) and learn how to use BigQuery. »