It’s important to understand how Google Analytics understands users’ data and uses it to prepare the reports. Lack of knowledge in this leads you to creating erroneous custom reports and draw wrong conclusions.
Google Analytics is the ubiquitous tool for web and digital analytics. According to statistics vendor Datanyze, Google Analytics and allied technologies comprise over 74.06% of the web analytics market.
That means there are millions of Google Analytics practitioners worldwide.
But here’s an interesting question. How many of them clearly understand the most fundamental concept of traffic data analysis with Google Analytics?
The fact is most of the Analytics practitioners do not understand how Google Analytics makes sense of the data collected — how it categorizes the data and prepares all those reports that you take for granted.
Understanding the fundamental concept of scope is essential for you to gather accurate insights from Google Analytics reports. Otherwise, your conclusions from GA reports may be flawed.
Let’s understand the concept of scope and how Google Analytics captures users’ data in this blog post.
You may come across a lot of reports on Google Analytics, across user data, traffic, user behavior, pages visited, search queries, advertising performance, etc.
Google Analytics captures a lot of data, but how it categorizes this data deluge is what matters. When you see a report, such as the one given below, what you see are essentially dimensions and metrics.
Understanding the dimensions and metrics is essential before we understand the underlying concept of scope.
Every report is made up of at least one dimension and one metric. In the browser information report shown above, the dimension is Browser and metrics are Users, New Users, Sessions, etc.
Dimension: The dimension indicates an attribute of the data presented within a report. So, the operating system of a visitor is a dimension with values such as Android and iOS. The traffic source is another example of a dimension with values such as Google.com and LinkedIn.com. Besides the standard dimensions given by GA, you can also create custom dimensions. On most Analytics reports, the dimensions are arranged into rows.
Metric: The metrics are numerical measurements of each dimension. For instance, the number of new users is a metric. The bounce rate is another metric. There are in-built metrics given by Google Analytics, calculated metrics that you can create using formulae, and custom metrics. On most Analytics reports, the columns constitute the metrics.
However, dimensions and metrics don’t give you the whole picture. You need to also know how Google Analytics captures the data and compiles its standard reports.
If you believe that every dimension can be coupled with every metric, you would be making erroneous judgments on your data.
Let’s look at a couple of examples.
Consider the All Pages report filed under Behavior > Site Content. This report by default shows you the page path and the number of page views, unique page views, bounce rate, etc., as specific metrics.
Imagine if a metric such as Session Duration would make sense here. A session in Google Analytics represents a user’s browsing session, which could include a lot of page views on other pages and other actions such as goal completions, events, etc.
Essentially, a browsing session may not be limited to only one page. So, adding the Session Duration as a metric in the All Pages report would lead you to erroneous conclusions. On the other hand, what you really need is the Avg. Time on Page metric, as shown below.
Obviously, the red-marked Session Duration data is too good to be true. This is because the Session Duration is not about the time a user spends on one page.
Here is another example. This time, let’s analyze the Source/Medium report given below along with the number of users and sessions.
At a cursory glance, the following report may look fine.
What’s wrong in counting the number of users and sessions from specific traffic sources and media, right? But the report hides errors in plain sight.
Are you able to spot any error here?
A visitor could access your site using an organic search on Google. He would be counted once in the “google/organic” category. That would be his first session on your website. Later, he could also visit you from a different source/medium such as “quora.com/referral”. This visit is a legitimate second session by the visitor, but the number of users hasn’t actually gone up, has it? Still, the metric Users in the report above will count that visitor again. So, the user is counted twice in two different source/medium categories. This makes the numbers you see above erroneous.
The above scenarios illustrate why we should have a clear understanding of how Google Analytics tracks and reports data. It’s through a small mechanism we mentioned, known as “the scope”.
The scope of data capturing by Google Analytics involves four levels — products, hits, sessions, users.
The hit is the most basic level of data captured by Google Analytics — in case of most websites. It’s akin to a page in a book or a drop in a bottle of water. In the Google Analytics world, it’s the smallest possible packet of data.
A view on a page is an example of a hit. Another example could be an event (custom actions conducted by users): Scrolling on a page, if tracked, is an example of an event.
Here is a nugget of information: Bounce rates are not calculated on the basis of subsequent page views, as many people believe, but subsequent hits. So, if a page view hit has happened and it is followed by a scroll on the same page, which is captured as another hit (an event), then that particular visit is not regarded as a bounce.
The session is the second level in the Google Analytics scope. A session represents a browsing session by a user that could go on for many minutes and include multiple page views or other activities. Essentially, a session includes multiple hits.
A session always starts with a pageview hit. So, a hit is automatically counted at the start of a session, but a session may not be counted at the start of every hit. This is because throughout the session there could be any number of additional hits.
A session is deemed terminated when either the user closes the browser or about 30 minutes of inactivity occurs.
The user is the highest level in the Google Analytics scope. A user consists of multiple sessions by a single user that could be spread across several days.
How does Google Analytics identify a user across multiple browsing sessions? It’s with the help of a first-party cookie that Google Analytics sets on the user’s browser when he/she visits the website for the first time.
This cookie is named “_ga”. The cookie once set remains on the browser for about two years unless manually cleared by the user. This cookie sets a unique identifier on the browser known as the Client ID. This ID is a numerical ID in the format: “xxxxxxxxx.xxxxxxxxx”. This is used to uniquely identify the user when he/she creates multiple sessions on your website.
Every time a user visits your website, Google Analytics looks for the _ga cookie to see if that user could be identified. If the visitor has no cookie set, he/she is regarded as a new user and a cookie is set on the browser with a unique Client ID.
Along with the first hit by a new user, the Client ID of the user is also sent to GA servers.
The product as a scope level was introduced after ecommerce website tracking became ubiquitous. The product level is touted as the lowest level of the scope, and a hit could comprise multiple products. However, the product is only relevant in cases where ecommerce tracking is enabled, so it’s not a generic scope level that applies to other types of websites.
The scope on Google Analytics is illustrated in the below image. You can see various examples of hits in this and how they are tailored to different sessions of a user.
The timeline of a visitor is could be as in the below example:
- Date 1: A new user visits the home page and exits. Here, a new hit, session, and user are recorded for the first time.
- Date 2: The user returns and this time visits home and blog pages and fills up a subscription form. At this time, Google Analytics identifies the returning user with the help of the Client ID. A new session is recorded and three hits are recorded, corresponding to the two page views and one form fill.
In the example above, a single user is responsible for two sessions and four hits.
Each metric and dimension is bound to a specific level of scope. Essentially, this means that you cannot combine a metric from one scope level to a dimension in another. Doing so brings about errors like the ones we saw in the above scenarios.
Why is scope so important?
The reports that you saw earlier were custom reports. Google’s default traffic reports do not make errors across different scopes. But when it comes to custom reports, you are free to experiment as you please. This could lead you to making mistakes in reporting and draw flawed conclusions.
To help you prepare flawless custom reports, Google has given a list of dimensions and metrics explorer. It lets you select any dimension, and the metrics that can be combined to it will be highlighted. If any dimension cannot be combined with another metric, the checkbox corresponding to it will be disabled.