Query your Google Analytics data with BigQuery
Query your Google Analytics data with BigQuery
1. Decide which type of Google Analytics connection to use in BigQuery: Google Analytics 360, Google Analytics 4 native, or a third-party option.
Third-party connectors include: Supermetrics using flat tables in Batch mode. Owox using tables with nested or repeated fields, similar to GA360 tables, in Streaming mode For now, we’ll use the GA public dataset. The sample dataset provides an obfuscated Google Analytics 360 dataset that can be accessed via BigQuery.
2. Click + Add Data > Explore public datasets to access the GA360 sample data tables in BigQuery public datasets.
Search for Google Analytics Sample in the public datasets. Click on blue View Dataset button to explore the added dataset. You’ll see one table that contains multiple tables per day, with a table suffix as a date. Some lines and fields are empty; this is caused by nested fields.
3. Differentiate between a sharded and partitioned table.
A partitioned table is divided into segments, called partitions, that make it easier to manage and query your data. An alternative to date/timestamp/datetime partitioned tables, date-sharded tables use a time-based naming approach such as [PREFIX]_YYYYMMDD. GA360 uses date-sharded tables: To query a sharded table you can add * to replace the date in the table name, to query all the data from all the sharded tables: SELECT count(visitId)FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` Or add the date if you want to query a data for the specific day: SELECT * FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801` You can also filter by using the table suffix: SELECT date, SUM(totals.pageviews) AS pageviews FROM bigquery-public-data.google_analytics_sample.ga_sessions_* WHERE _TABLE_SUFFIX BETWEEN '20160729' and '20170801' GROUP by date
4. Spot the nested fields in Google Analytics 360 data, and perform some queries on this data using the UNNESTED function.
Each row in GA360 data represents a single session and contains many fields, some of which can be repeated and nested. Hits contains a repeated set of fields, representing the page views and events during the session. Use UNNEST function to query the nested fields. Try this query: SELECT hits.hitNumber AS hitNumber, hits.page.pagePath AS pagePath FROM bigquery-public-data.google_analytics_sample.ga_sessions_*, UNNEST(hits) as hits where hits.hitNumber = 1 group by hitNumber, pagePath
5. Query GA360 data using UNNEST and TABLE SUFFIX.
Try this query to get pages per session for every visitor and every visit: SELECT fullVisitorId, visitId, visitNumber, hits.hitNumber AS hitNumber, hits.page.pagePath AS pagePath FROM bigquery-public-data.google_analytics_sample.ga_sessions_*, UNNEST(hits) as hits WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170731' AND hits.type="PAGE" ORDER BY fullVisitorId, visitId, visitNumber, hitNumber
6. Continue to explore the Google Analytics data.
Explore the Google Analytics table schema to understand what are fields available for your queries. Find what are the nested fields to query them correctly; search for RECORD type in the schema, to find a sub-table with nested fields. Use the Standard SQL, and query your data as you would do with any other table, just take into consideration Table Suffix and Nested fields.
7. Work on a real use case, and write a GA360 query using what you've learned in the previous steps.
Create a table with one line per user/date of visit with the following variables, only the last 90 days of the data: fullVisitorId date total number of page views on the last 30 days prior this date average time spent during a session on the last 30 days prior this date It will give you the quality of visit per user, per date, and you can add any other variables to include more quality indicators in your analysis. Query: DECLARE maxDate DATE; SET maxDate = ( SELECT PARSE_DATE('%Y%m%d', MAX(date)) FROM bigquery-public-data.google_analytics_sample.ga_sessions_*); WITH features_table AS ( SELECT fullVisitorId, date, trafficSource.source, SUM(totals.pageviews) AS pageviews, IFNULL(SUM(totals.timeOnSite), 0) AS timeOnSite, CASE WHEN device.deviceCategory = "tablet" THEN '3' WHEN device.deviceCategory = "mobile" THEN '2' ELSE '1' END AS deviceCategory, CASE WHEN SUM(totals.transactions) IS NULL THEN '0' ELSE '1' END AS TRANSACTION FROM bigquery-public-data.google_analytics_sample.ga_sessions_* WHERE PARSE_DATE('%Y%m%d', date) >= DATE_ADD(maxDate, INTERVAL -90 DAY) GROUP BY fullVisitorId, date, device.deviceCategory, trafficSource.source ORDER BY fullVisitorId, date ) SELECT *, IFNULL(SUM(pageviews) OVER(PARTITION BY fullVisitorId ORDER BY UNIX_DATE(PARSE_DATE('%Y%m%d', date)) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW), 0) AS _30DaysSumPageviews, IFNULL(ROUND(AVG(timeOnSite) OVER(PARTITION BY fullVisitorId ORDER BY UNIX_DATE(PARSE_DATE('%Y%m%d', date)) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW),1), 0) AS _30DaysAverageTimeOnSite FROM features_table