DataHem: open source, serverless, real-time and end-2-end ML pipeline on Google Cloud Platform

I’m excited to say that the project I’ve been working on the last year is now released as OpenSource (MIT license). DataHem is a serverless real-time end-2-end ML pipeline built entirely on GoogleCloud Platform services - AppEngine, PubSub, Dataflow, BigQuery, Cloud ML Engine, Deployment Manager, Cloud Build and Cloud Composer. When building ML/Data products, your most valuable asset is your data. Hence, the purpose of DataHem is to give you: - Full control and ownership of your data and data platform - Unsampled data - Data in real time - The ability to replay/reprocess your data unlimited times - Data synergies, i. »

Bigquery Training Resources for Digital Analysts

In this post I’ve tried to collect different training resources that I’ve found useful for myself, some for free and some for a fee. The focus is using BigQuery for digital analytics. If you are one of the lucky digital analysts who work for an organisation with the 360 version of Google Analytics or Firebase Blaze, but not started using BigQuery? Then, don’t wait for it, enable the BigQuery Export (read this post if you are acting in EU) and learn how to use BigQuery. »

Flatten Firebase Properties and Parameters in Bigquery

At Google I/O May 2017, Firebase announced Google Analytics for Firebase, a fantastic tool that automatically captures data on how people are using your iOS and Android app and lets you define your own custom app events. Like Google Analytics 360, it offers the ability to export raw data to Google BigQuery for custom analysis. There are a few posts on Google Cloud Platform Blog and Firebase Blog on how to query the Firebase dataset, but none of them giving much advise on how to analyze multiple properties and parameters at the same time. »

Google Analytics Custom Dimension Alias in Bigquery

Second to being able to export your Google Analytics data to Google BigQuery, the feature I value the most with the premium version of GA is that you are not limited to 20 custom dimensions but have 200 to play with! However, if you have many custom dimensions, it quickly becomes hard to remember what dimension each index represents, the value isn’t always selfdescribing. Hence being able to give the custom dimension a more descriptive identifier than an index could be useful. »