Google Analytics Data Import From Google Datalab Part 1

I’ve been doing some analytics with Google Datalab (easy to use interactive tool for data exploration, analysis, visualization and machine learning) the last year. My focus has been explorative analytics of the Google Analytics data that we export to BigQuery. But what is data science if we don’t incorporate the data back into the real world to enrich the user experience? Hence I thought I would show how you to import your data insights back into Google Analytics.

First we need to set up datalab in the cloud accordingly, I’ll assume you already have a project set up in Google Cloud Platform and enabled required API:s/services. There is a video giving a good introduction, but missing steps such as datalab for teams and setting up scopes for pushing data back to Google Analytics. Also, I do all commands from the Google Cloud Shell that already has the necessary components installed and also provides a built-in code editor - Orion. If you run gcloud from command line locally, you need to install the datalab component first.

Since I’m mostly working in teams using Google datalab I prefer using a common service account to limit access to services and set scopes. Hence I create a service account and give it the required roles. Then, I add each user to the project and give them IAM roles for the service account that we will attach to the user’s Cloud Datalab instance

#Create service account
gcloud iam service-accounts create myserviceaccount \
    --display-name "my service account"

#Give required roles to service account
gcloud projects add-iam-policy-binding myproject \
    --member serviceAccount:myserviceaccount@myproject.iam.gserviceaccount.com \
    --role roles/bigquery.user

gcloud projects add-iam-policy-binding myproject \
    --member serviceAccount:myserviceaccount@myproject.iam.gserviceaccount.com \
    --role roles/bigquery.jobUser

gcloud projects add-iam-policy-binding myproject \
    --member serviceAccount:myserviceaccount@myproject.iam.gserviceaccount.com \
    --role roles/bigquery.dataViewer

#Add user and give required roles
gcloud projects add-iam-policy-binding myproject \
    --member user:myname@mycompany.com \
    --role roles/iam.serviceAccountUser

gcloud projects add-iam-policy-binding myproject \
    --member user:myname@mycompany.com \
    --role roles/compute.instanceAdmin.v1

Thereafter, create a Google datalab instance. Here I create an instance called datalab-myname for user myname@mycompany.com using the project myproject.

datalab create "datalab-myname" \
    --disk-size-gb "20" \
    --for-user myname@mycompany.com \
    --project "myproject" \
    --zone "europe-west1-b"

But that is not enough, we need to change scopes as well since the default scopes don’t include analytics. In order to change scopes you fist need to stop the instance:

gcloud beta compute instances stop "datalab-myname" \
    --zone="europe-west1-b"

Then we set scopes to also include the google analytics service:

gcloud beta compute instances set-scopes datalab-myname \
    --service-account "myserviceaccount@myproject.iam.gserviceaccount.com" \
    --scopes "https://www.googleapis.com/auth/cloud-platform,\
https://www.googleapis.com/auth/analytics.edit,\
https://www.googleapis.com/auth/analytics,\
https://www.googleapis.com/auth/bigquery,\
https://www.googleapis.com/auth/devstorage.read_write,
https://www.googleapis.com/auth/compute"

Then restart the instance

gcloud beta compute instances start "datalab-myname" \
    --zone="europe-west1-b"

Then give the service account (myserviceaccount@myproject.iam.gserviceaccount.com) edit priviliges for each account/property in Google Analytics admin settings and you’re done!

Now the user (myname@mycompany.com) can connect to Google datalab by going to the cloud shell and type:

datalab connect "datalab-myname" \
    --zone "europe-west1-b"

Stop the instance when you are finished to avoid being charged more than what is needed:

datalab stop "datalab-myname"

In the next posts we’ll go through the code to query BigQuery and prepare the results for Google Analytics data import and finally upload the data and save the notebook to Google Source repositories using git.

Robert Sahlin

Digital Data Scientist and Engineer, Open source enthusiast.

Stockholm, Sweden https://robertsahlin.com