cortex.labs

February 17, 2025

Free Data Extraction from Google Analytics Universal Analytics to Google BigQuery using Airbyte

As the sunset of Universal Analytics by Google approaches, it becomes essential to transfer old data from Universal Analytics (UA) to a data warehouse for further analysis. This article presents a free and efficient method to extract data from Google Analytics: Universal Analytics and load it into Google BigQuery using Airbyte, an open-source data integration platform. By following the steps outlined below, you can seamlessly transfer your historical data from UA to BigQuery, enabling you to leverage advanced analytics techniques and gain valuable insights.

Before diving into the integration process, it’s important to consider the available options for integrating Google Analytics (GA) with BigQuery (BQ). Here are the commonly used methods:

Understanding the Options:

  • Native BigQuery Integration (Google Analytics 360 only): Expensive, limited to GA 360 users.
  • Google Sheets Export: Not recommended due to manual processes.
  • Code-based Integration: Requires technical knowledge and coding skills.
  • Manual Export/Import: Cumbersome process prone to human error.

Understanding the Tools:

Before diving into the integration process, let’s have a clear understanding of the tools involved:

  • Google Analytics: Universal Analytics — A web analytics tool that tracks and measures user interactions on websites or mobile applications, providing insights into user behavior, traffic sources, conversions, and more.
  • Google BigQuery — A fully-managed, serverless data warehouse designed for handling large datasets and performing high-speed, ad hoc queries for analysis.
  • Airbyte — An open-source data integration platform that enables connecting and replicating data from various sources to a preferred data destination, providing a user-friendly interface for configuring and orchestrating data pipelines.

Integration Steps:

Follow the steps below to integrate Google Analytics: Universal Analytics with Google BigQuery using Airbyte:

01. Install Docker:

Docker is a prerequisite for initiating the data extraction process. Begin by downloading Docker. If this is your first time installing Docker, follow the provided steps to resolve any potential errors and ensure a successful installation.

  1. Open the Control Panel on your Windows machine.
  2. Click on “Programs” and then select “Turn Windows features on or off”.
  3. Check the boxes for “Windows Subsystem For Linux,” “Windows Hypervisor Platform,” and “Virtual Machine Platform,” then click “OK”.
  4. Open PowerShell as an Administrator.
  5. Type the following command: dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart, then press Enter to enable WSL 1.
  6. Download the WSL 2 update from Microsoft using this link: https://aka.ms/wsl2kernel.
  7. Install the downloaded update by double-clicking on the file.
  8. Open PowerShell as an Administrator.
  9. Type the following command: wsl — set-default-version 2, then press Enter to set WSL 2 as the default version.
  10. Restart your computer to ensure the changes take effect.

02. Set Up Airbyte:

a. Download and install the git app

b. Open Docker and terminal.

c. Execute the following commands in the terminal to install and run the Airbyte containers.

d. Access Airbyte through the provided URL, entering the username and password.

http://localhost:8000

Username: airbyte

Password: password

e. Application start window will popup as follows,

03. Add Google Analytics as a Source:

a. Click on the “Sources” tab in Airbyte and select “+ New source.”

b. Choose “Google Analytics (Universal Analytics)” and select “Service Account Key Authentication” under the credential section.

c. Create a service account in Google Cloud, obtain the JSON key, and add the key’s content to Airbyte’s service account credentials.

Select “Service Account Key Authentication” under the credential section for a secure and automated connection between Google Cloud and Google Analytics. This option eliminates the need for manual authentication during each connection setup, ensuring a streamlined and secure process.

Create a Google Cloud Account: Start by creating an account on the Google Cloud platform. Google offers $300 worth of cloud activities for a 30-day trial, allowing you to explore its features. Once you sign up, you will be directed to an interface that looks like the following:

Set Up a New Project: Within the Google Cloud interface, create a new project with a unique name. This project will serve as the container for your service account and associated resources.

Create a Service Account: Navigate to the “IAM & Admin” panel in the Google Cloud interface. From there, create a new service account by providing a name and defining its access permissions. It’s essential to exercise caution when granting access to a service account. For demonstration purposes, we will use the “Editor” access level, but it is generally recommended to assign more granular permissions like “BigQuery Admin” or “BigQuery User.”

Obtain the JSON Key: After setting up the service account, you can obtain the key in JSON format. This key establishes a secure and automated connection between Google Cloud and Google Analytics. Download the JSON key file and open it. Copy the content of the file as you will need to paste it into the Airbyte service account credentials.

Go to your Google Analytics account and add the created service account as a user with the “Editor” permission.

By following these steps, you can create a service account in Google Cloud and obtain the necessary JSON key for authentication. This allows for a secure and seamless connection between Google Analytics and Airbyte during the data transfer process.

d. Provide the view ID of your Google Analytics account.

e. Provide the desired date range

We are almost ready to add the data source, but before that, we need to enable the following APIs: Google Analytics Reporting API and Google Analytics API in Google Cloud.

Once the APIs are enabled, click on “Set up source.” After a short period of time, you will see a success window as shown below:

This indicates that the setup process was successful and you can proceed to the next step.

04. Set Up Google BigQuery as a Destination:

a. Select Google BigQuery as the destination in Airbyte.

b. Retrieve the Project ID from Google Cloud project.

After setting the dataset location to your desired location, it’s important to note that this location should align with the dataset location in BigQuery. Let me explain this further.

c. Set the dataset location and configure the loading method (recommended: GCS Staging using Google Cloud Storage).

The next step is to set up the loading method. There are two options available: “Standard Insert” or “GCS Staging.” The recommended method is GCS Staging, which utilizes Google Cloud Storage as a staging location before loading data into BigQuery. This approach offers several advantages, including:

  • Faster Data Loading: The process of loading data becomes incredibly fast when using GCS Staging.
  • Flexibility in Table Usage: With GCS Staging, you have the option to remove BigQuery tables when they are not in use and import them as needed. This is particularly beneficial as BigQuery tables tend to be larger compared to compressed data in Cloud Storage.
  • Preservation of Local Storage: GCS Staging allows you to save local storage by utilizing Google Cloud Storage as an intermediate storage solution.
  • Improved Table Creation Reliability: By using GCS Staging, the chances of failures during table creation are minimized. This is because issues related to networking or local computer resources, which can occur when using local storage, are avoided.

By selecting GCS Staging as the loading method, you can take advantage of these benefits and ensure a smooth and efficient data loading process into BigQuery.

To create a Google Cloud Storage bucket, follow these steps, keeping the following points in mind:

  1. Provide a unique name for the bucket.
  2. Set the location type to match the BigQuery and Airbyte dataset location discussed earlier.
  3. Disable the retention policy for the bucket.
  4. Enable the “Object version” feature for the bucket, as shown in the image.

d. Get the HMAC keys

e. Complete the remaining details on the destination page, including the desired query job type.

Additionally you can choose to run interactive query jobs or run batch queries.

Interactive query jobs — Interactive queries are executed as soon as possible and count towards daily concurrent quotas and limits

Batch Query Jobs — executed as soon as idle resources are available in the BigQuery shared resource pool. If BigQuery hasn’t started the query within 24 hours, BigQuery changes the job priority to interactive. Batch queries don’t count towards your concurrent rate limit, making it easier to start many queries at once.

For the demonstration purposes i will hoose Interactive query jobs. Then you can select Big query client chunk size. Where it demonstrate if there is a networking or memory management problems with the sync (specifically on the destination), you can try decreasing the chunk size. In that case, the sync will be slower but more likely to succeed. I will set as the for the default on 15MiB.

05. Establish the Connection:

a. Set up the connection between the source (Google Analytics) and the destination (Google BigQuery) in Airbyte.

b. Choose the desired data format (normalized tabular data or raw data JSON format).

c. Once the data transfer is complete, stop the Docker container.

Conclusion:

By following the outlined steps, you can effectively transfer data from Google Analytics: Universal Analytics to Google BigQuery using Airbyte. This free solution allows you to save your marketing budget while gaining access to powerful analytics capabilities. Whether it’s extracting data from Facebook Ads, TikTok Ads, Instagram Analytics, Amazon Ads, or other sources, Airbyte provides a seamless data transfer process to your desired location. Leverage these techniques to generate unique and impactful results, taking your business to the next level.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top