Dashboards are only useful if they are visible and easy to access. Consequently, many clients display theirs on communal monitors or projectors and need their data to update in real time. Google Data Studio is a fantastic data visualisation tool but treats some connection types (e.g. Google Sheets) as static data. Data Studio reports that include these connection types currently require a user to manually refresh the data within a dashboard in order to display the most up to date information. This is somewhat of a problem when it comes to communal displays. Let’s face it, nobody wants to be the person manually clicking refresh every minute!
This blog details how our cloud team came up with an innovative solution to the problem using a airflow and the Auto Refresh Chrome browser extension.
A Closer Look at the Problem
The first key point to understand is how Data Studio accesses and stores the data used to generate its reports. Essentially, Data Studio pulls the information from ‘connections’ and holds this data in temporary storage (we’ll refer to this as cached data) which is then used to populate the dashboard widgets:
The second key point to understand is that Data Studio processes the connection differently dependent on the type.
The final point to understand is which user actions are linked with the automatic and manual updates detailed above. The browser refresh button effectively re-opens a report i.e. it therefore only automatically updates the cached data for “auto” data sources.
Amending the date range within a report also automatically updates the cached data for “auto” data sources only.
In order to update the cached data for “static” data sources, the refresh button within Google Data Studio itself must be used (or the connection can be edited, essentially reconnecting the data source):
Without this manual refresh, the information displayed within the report widgets would still be the data available at the point the connection was originally configured/the last time the button above was pressed. Critically, for our purposes, a manual data refresh updates the cached data at a global level and makes it available to all report users.
The following table summarises how different user actions update the data displayed for the different data types:
|Report Type||Browser Refresh||Date Range Change||Data Studio (Manual) Refresh|
|Auto Data Sources Only||Auto Data Sources Only||Auto Data up to date||Auto Data up to Date|
|Static Data Sources Only||Static Data from point of last manual refresh||Static Data from point of last manual refresh||Static Data up to Date|
|Mixed Data Sources||
Auto Data up to date
Static Data from point of last manual refresh
Auto Data up to date
Static Data from point of last manual refresh
|Auto Data Up to Date
Static Data up to Date
This means that refreshing the page or even the use of control and date range filters doesn’t refresh the data for static source types.
So if we are in need of a real-time or updating report that includes a Google Sheet as a data source, we’re stuck! The only options available to refresh the data are the Data Studio refresh data button, which is tedious.
(Credit to Giphy for the perfect gif)
Introducing Automation Tools
Automation tools are a key part of the analytics and marketing landscape. They enable businesses to map, then complete a repeatable predictive process, without the need to manually complete the process themselves. This frees up valuable time for insights and can enable advanced solutions.
There are many automation tools available; the following provides a quick summary of the tools used for this process.
Airflow is an open sourced automation platform, that allows you to write out a process in python and have it executed on regular intervals. Airflow is easy to use, free, has tonnes of built in integrations, and the ability to schedule processes using a simple Cron syntax.
For newbies to Cron, the following diagram should help you to understand the syntax:
For example, if you want the project to run each minute during business hours on week days, the syntax would be:
* 9-16 * * 1-5 *
GitHub is a code hosting platform for version control and collaboration. This enables team members to work together on a project from anywhere. You create a repository from which branches can be started to manage/pull/push the recent/old commits of your code. You can also merge the code with another version of the code that is pushed by another user. Simple command line arguments help to achieve the version control on your project.
Building our Solution
Our solution will be based across two machines:
- A display screen machine
- A secondary machine running the Airflow server and automation scripts
To create our automation, we first have to map the virtual user journey. The journey to click the refresh button and update the dashboard data can be broken down as follows:
- Login to Data Studio
- Navigate to & open the report
- Click on refresh data button
This flow is then translated into code using Selenium/Python and tested locally to confirm whether it is running successfully or not.
We initially ran the automation script on an infinite 1-minute loop which worked okay but wasn’t very efficient. This is where Jenkins came to our assistance. The Cron-based scheduling feature allowed us to run our updates within business hours only.
Say Hello to Airflow
By using Airflow, we can simply just specify a Python Operator which will simply execute a python function. We can specify out script inside of this function and supply Airflow with the other arguments to run DAG (directed acyclic graph) such as the how often it should run via the CRON expression. I am not going to go into detail on how Airflow works as there is already a plethora of guides, documentation and tutorials on how to use Airflow. The code should look something like the following:
After you add in the DAG declarations plus relevant arguments and imports you can place this code inside the dags/ folder of your airflow installation and you should see it populate inside of the airflow webserver UI.
Adding the Auto Refresh Extension
Since the automation is not running on the display machine, we need a way in which to sync the data displayed on that screen with the most recent refresh generated by the automation. This is something that can be handled by an extension within the Chrome browser. There are several available, but we like the Auto Refresh extension which enables us to automate page refreshes at set regular intervals (from seconds to hours). We recommend to set the interval to update at the 40 seconds rate, as our automation script runs each minute.
The End Result
A multi-client configuration, with Jenkins running remotely from a display computer running Chrome and the Auto Refresh extension, enables Data Studio to become a fully automated dashboard solution that is ready for prime-time communal viewing. Note that whilst we ran our tests on the free, locally hosted version of Jenkins, it would be possible to run this via a Cloud-based hosting solution if you have that available.
Please email [email protected] if you would like to learn more about this solution or would like us to help you implement something similar. Please also reach out if you are interested to learn more about QA and automation in general.