Bigquery efficient access management

A strategic decision we’ve made at MatHem is to enable users to connect to or data warehous (BigQuery) with whatever tool (tableau, data studio, collab, etc.) they prefer and still be certain that they only can access data that they have permission to. That turned out to be a challenge in BigQuery with the current access management capabilities, since you give users/roles (or authorized views) access on the dataset-level and not views/table-level. In order to address that challenge at MatHem (while waiting for a native BQ-solution) I developed datahem.viewer. It is open source and released as part of DataHem.

Datahem.viewer is a python script that reads view definitions from json-files and then generate datasets and views and give correct access rights. The solution is a bit rough and could make use of dependency mapping (when a view queries another view) instead of the current solution iterating views until all views are generated or the script can’t generate anymore views (broken dependencies). The script generates two datasets per group, one with READER (suffix ‘_ro’) and one with WRITER (suffix ‘_rw’) to make sure that views generated by data team can’t be modified and at the same time give a sandbox for the group. The group should be an e-mail group and the name of datasets will be the local-part of the email address. The script is executed by google cloud build (CI/CD) and triggered by a push to our github repo.

Read the stackoverflow post if you want some more background on the topic.