by Huiyan

25/07/2018

BigQuery, Google Analytics

BigQuery Caveat – Two exits in one Google Analytics session?

Yesterday during a BigQuery analysis (with Google Analytics data) we encountered a weird thing. Within the same session, a tiny proportion of the sessions had more than one exit, a maximum of two exits, to be exact. (Exit is defined as having “hits.isExit = True”.)

 

As shown, all of those exit hits are pageview hits, so no non-interaction hits. The two exits hits usually happen on different pages. And their hitNumber and hits.time both suggest they happened far apart enough to rule out the possibility of a race condition.

So what happened?

Turns out, it happens when the clock turns midnight. (Thanks Kai!)

As suggested by the highlighted time column in the table above, all of those exceptions happen when the session spans across 00:00. Therefore,  it appears that hits.isExit resets when entering a new day, within the same session.

But actually, the issue all goes down to how we define a session. Our approach is to use a combination of fullVisitorId and visitId. Even though in the BigQuery export schema documentation page, this is suggested to be the unique session identifier, there is a caveat in the difference between visitId and visitStartTime. At a glance, they appear to be showing the exact same numbers. However, visitStartTime changes when a session passes the midnight, whereas visitId remains consistent. In this case, hits.isExit is using the former as the session identifier, thus causing the issue.

Comments

Leave a comment

Your email address will not be published.