Summary
Extract-transform-loaded (ETL): GA data (1.4 million records) ➜ BigQuery ➜ Google Cloud Storage ➜ Tableau Server Connection.
BigQuery was used to calculate desirable metrics (Page views, time on page, bounces, sessions) for use as KPIs in Tableau. Created filters and parameters for segmentation and comparison analysis.
What is this data?
The sample dataset contains obfuscated Google Analytics 360 data from the Google Merchandise Store. It consists of 900k records (4.2 million unnested), 338 variables, and includes the following kinds of information:
- Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc.
- Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc.
- Transactional data: information about the transactions that occur on the Google Merchandise Store website.
You can query the data in GCP BigQuerry by date in the following way:
FROM 'bigquery-public-data.google_analytics_sample.ga_sessions_20170101'
The schema can be found here
SQL Breakdown
Collect 8 descriptors and 5 Measures for import
Descriptors:
- hit_year
- hit_month
- pagePath
- source
- channel_grouping
- device_category
- country
- city
Measures:
- pageviews
- unique_pageviews
- total_time_on_page_combined
- total_bounces
- total_sessions
Constant Categories
The descriptor variables will be selected for all measure queries.
SUBSTR(date, 1, 4) AS hit_year,
SUBSTR(date, 5, 2) AS hit_month,
hits.page.pagepath AS pagePath,
trafficsource.source AS source,
channelgrouping as channel_grouping,
device.deviceCategory as device_category,
geonetwork.country as country,
geonetwork.city as city
Tableau Calculations
Parameters:
- Comparison Selector: M/M (current month vs previous month), Y/Y (current month vs month from previous year), YTD (current YTD vs previous YTD)
- Date Selector: Select the Month/Year of interest
Categorical Descriptors:
- Geo Location
- Device Category
- Traffic Source
- Channel Grouping
Metrics:
- Sessions
- Pageviews
- Unique Pageviews
- Time on Page
- Bounces
- Average Unique Pageviews per Session
- Average Time on Page per Session
- Percentage of Sessions that Bounce
Sessions:
- Total Session Comparison: SUM(current session range) - SUM(previous session range) / SUM(previous session range)
- Session YTD (Year to Date): Return [Total Sessions] for selected Month and prior, within same Year.
- Session CM (Current Month): Return [Total Sessions] for selected Month and Year
CM: Current Month PM: Previous Month PY: Previous Year YTD: Year-to-Date PYTD: Previous Year-to-Date LTD: Life-to-Date (for grey charts)
For Percent of Total (Avg Unique Pageviews CM for Channel Grouping), the calculation must be modified from TOTAL() to WINDOW_SUM(). This is because the TOTAL() function does not work properly on measures that are already an aggregate function. ref