by Doug

05/04/2016

Google Analytics

Google Analytics – Querystring parameter INCLUSION is the new black

Problem summary

As sites grow in scale they grow in complexity.  Multiple development teams get involved, some from third party agencies, some internal. Temporary features get added that are good enough. This is reality and as analysts, we all face the same challenge to maintain data quality in line with progress and change.

As sites grow in scale and complexity, more query string parameters are employed on page paths. Some are useful. Some are not and pollute your data.

Failure to manage the query string parameter exclusion list in your GA views can result in artificially inflated cardinality in pages reports.

Problem update

You know you need to manage the query string parameter exclusion list in your GA views but you have 200 views to manage and the list of query string parameters in use grows weekly and you spend vast amounts of time weeding out pollution whilst keeping the right parameters in the data. You end up with a variety of exclusion lists that all resemble something like this:

Nasty isn’t it?

Solution – query string parameter INCLUSION

You probably found already that the set of useful parameters is much smaller than the list you need to exclude. Rather than chase the bad guys, why not just welcome in the good guys you know and love? Less maintenance required, guaranteed data quality and reduced cardinality.

We’ve seen pages reports shrink from 250,000 rows to fewer than 100,000 for clients with no metrics loss using this technique.

How?

We use a series of filters that are easily managed and deployed via the management API to include only the query string parameters we want – all others are excluded. Here’s the basic flow:

  1. Clear the Custom Field 1 value
  2. Extract the first query string name/value pair into Custom Field 1
  3. Extract the second query string name/value pair. Append it to Custom Field 1
  4. Repeat step 3 as many times as required for each useful query string parameter to include
  5. Clean ALL query string parameters from the Request URI field
  6. Append Custom Field 1 to the cleaned Request URI

Bingo – your page paths in your pages report will now only include the query string values you want. Chasing the exclusion list is a thing of the past and data quality is in the palm of your hand. Go and do some serious analysis instead of house keeping!

Lets assume we have three query string parameters – a, b, and c. No other query string parameters are of interest and are just pollution. We may not always have these values on our URLs. They can appear in different orders and we will often have no parametrisation at all. Our filters have to work in all these cases…and indeed they do. Here’s how:

Filter 1 – Blank CF1

Any filters that use Custom Field 1 or Custom Field 2 always need to sanitise them before use. We’re only using CF1 so let’s sanitise that first:

Filter 2 – Extract the first query string parameter into CF1

Our filter needs to find a query string parameter called a that can have a value including letters and numbers. It’ll be after a ‘?’ or ‘&’ (we don’t know the order of the parameters) and it might be the first, last or only parameter. We’ll use a regular expression to lift the right name value pair. You might choose to change the reg ex to match the type of value you’re expecting – words only, number only – exercise your reg ex skills as need be but change the ‘a’ to match the name of your first parameter as required:

(\?|&)(a=(\d|\w)+)(&|$)

Notice that we’ve set the value of CF1 to $A2&

If there is a query string value for parameter named a (a=abc123) then we set CF1 to a=abc123&

This makes CF1 ready to take the next value…if there is one. As Field A is required by the filter, if there is no parameter ‘a’ then CF1 is unmolested. On to the next query string parameter.

Filter 3 – Extract the second query string parameter into CF1

We use the same reg ex as filter 2 but with the name changed as required:

Notice we extract everything from CF1 as Field B but it’s not a required field. If the first parameter wasn’t present on the URL, we still need this filter to fire. Whether CF1 has a value or not, we output $B1 (whatever was extracted from CF1) back to CF1 with the name value pair extracted from the request URI appended immediately after. Here’s the potential set of values for the parameters and CF1 so far:

Parameters Value of CF1
a=abc123&b=xyz456 a=abc123&b=xyz456&
a=abc123 a=abc123&
b=xyz456&a=abc123 a=abc123&b=xyz456&
b=xyz456 b=xyz456&

Of course, if there are no parameters, then CF1 remains blank. This is the filter you would ‘rinse and repeat’ if you wanted to extend the range of parameters to be included. The first and last filters differ slightly.

Now for the last parameter filter.

Filter 4 – Extract the last query string parameter into CF1

Same reg ex used here. The only difference is that we know this is the last parameter we’re extracting from the Request URI field so the value we append to CF1 doesn’t have the ‘just in case’ ampersand:

Filter 5 – Trim unnecessary &

About that ‘just in case’ ampersand…We use this assuming that we have all three (in this case) parameters on the URL so CF1 can be a=abc123&b=xyz456&c=conversionworks That’s the ideal case but what if ‘c’ is not present? What if we only have a? CF1 will be a=abc123& which is messy so we make sure there is no trailing ampersand on CF1:

Filter 6 – Clean the Request URI field

Now we’ve extracted all the query string parameters we need (if they’re present) into a nice clean CF1, we can use a pretty simple search and replace filter to clean any and all query string parameter pollution from the Request URL field:

Filter 7 – Append CF1 to Request URI

Finally having extracted, cleaned, and stripped various fields, we can now stitch the Request URI back together with only the query string parameters we want in our data:

Remember, we only molest the Request URI if any of our favoured query string parameters have been found. Field B is required so if CF1 is still blank at this point, we don’t touch the Request URI – it stays clean and devoid of query string pollution.

Wrap up

If you use these filters, a few important points need to be made.

First is that filter order is vital:

Next, if you try this technique, do it on a test view – not straight to production…on a Friday…at 17:00. No, test carefully in a safe place and protect your production data.

Always maintain a RAW view to see what data is actually getting stripped out.

You don’t have to append the allowed query string parameters to the Request URI. If you want the ultimate in clean page paths, you could always decorate the pageview with these values using custom dimensions.

You may wonder why we use a filter per parameter rather than one reg ex. You could…possibly. If you (and you team…and your future team) can write and maintain the reg ex then go ahead – not my thing though thanks. I prefer simple. Actually, it’d be great if we had an infinite number of Custom Fields to play with rather than all this extract and append stuff but we don’t so here we are.

All being well, you’ll have cleaner data with less effort and therefore, more time to spend on the analysis and action that REALLY adds value!

Comments

Leave a comment

Your email address will not be published. Required fields are marked *