Why we’re not blogging about Referral Spam
The sky is not falling in. Then end of the world is NOT nigh. Keep calm and keep measuring (using good data practices). No, referral spam is not the disaster many so-called experts would have you think.
One (true) expert, Brian Clifton has blogged about solid practices you should employ (we do!) on your Google Analytics accounts to protect yourselves from the irritating referral spam – https://goo.gl/r2Hqa6
This is a solid article with reasonable comments too although my preference is to keep the ‘fix’ in the GA config using the filters as described rather than using htaccess mods. The fix is specific to GA so my preference is not to pollute other systems – keep the fix decoupled from other systems.
Whilst the suggested filters are good practice and are representative of a solid GA install (these techniques are just good practice) they are not going to stop ALL referral spam. We’re still vulnerable to Measurement Protocol injection.
Is this a bad thing? Does this reflect badly on GA as a vulnerability?
We consider event tracking, virtual pageviews, transaction tracking carefully as part of ‘data quality’. The measurements we take are not trivial decisions and such care is required in all aspects of data capture – both intentional and accidental.
In striving for high quality data we consider how we use the data, what it’s for, who will use it and how to deliver it. We make choices to normalise and sanitise data to make it actionable and fit for purpose. If we have data pollution, we act to mitigate. Referral spam falls under the banner of pollution. We deal with it.
The effort required to deal with the pollution isn’t massive. The urgency you apply to the fix will depend on the impact on your data. If referral spam is high enough to impact your data to an appreciable degree, it’s quite possible spam is less of an issue than the data volume you’re collecting anyway. By that, I mean if your data volume is small enough to be impacted by spam, was the data actionable in the first place? Was it rich enough to base reliable business decisions on? Were you making calls on insignificant data?
Now, I’d be ASTONISHED if the Google Analytics team weren’t aware of the issue. Indeed, the addition of the ‘Exclude all hits from known bots and spiders’ functionality is evidence that Google DO take ‘automatic’ data quality serious enough to act on it.
Take the advice Brian gives in his article. Act on it. Use your data wisely and appropriately. Be aware of changes in GA that can help you. I’ve no doubt Brian will add some notes to his post when Google add further support – we will.
In the mean time, be ‘grown up’ about your data. Don’t spread unnecessary hysteria.