Modern data workflow orchestration, part 2

The previous post gave some background on why you should try out GCP Workflows. This post is more technical and shows how to use Infrastructure as Code to easily set up GCP Workflows that also supports re-runs and backfills. Solution Architecture The requirements we have are: We want re-run capabilities (idempotency) and backfill capabilities. We will use a custom cloud run service built with FastAPI. We want to set up workflow orchestration and scheduling of batch jobs with Infrastructure as Code (IaC)....

February 25, 2022 · 3 min · Robert Sahlin

Modern data workflow orchestration, part 1

Do we really need full blown orchestration services like Composer (managed Airflow) in a modern GCP data stack? Bundling vs Unbundling Airflow There’s been a lot of discussion about Airflow’s role in the Modern Data Stack lately and how different tools and services are unbundling the responsibilities of Airflow. It is a very interesting discussion, but to be fair to Airflow it was built as a workflow manager but its flexibility has invited users to add additional responsibilities resulting in a anti-pattern to fill the gap of a missing control plane across data tools/services....

February 25, 2022 · 4 min · Robert Sahlin