Schema evolution in streaming Dataflow jobs and BigQuery tables, part 1

In the previous post, I gave an overview of MatHem’s streaming analytics platform DataHem. This post will focus on how we manage schema evolution without sacrificing real-time data or having downtime in our data ingestion. The streaming analytics platform is built entirely on Google Cloud Platform and use services such as Dataflow, BigQuery and PubSub extensively. Another important component are protobuf schemas. 1 Protocol buffers There are many different frameworks for serialization/deserialization of data....

November 10, 2019 · 5 min · Robert Sahlin