3 Reasons Your Web Analytics Data is Wrong

Website traffic is a difficult source of data.

Consider the way a chemist collects data while testing a chemical reaction. They use known chemicals dissolved at precise concentrations and mix a precise volume of each into sterilized flasks. They weigh the resulting precipitate on a precise scale. The chemist has full control over the inputs in this experiment, and the outputs are measured very carefully. They collect only the data they need to test their hypothesis about how the chemicals will interact. If they mess up, they can measure the same reaction again.

Now consider how a web analyst collects data. Web data is collected automatically, en masse, from many different “people” in many different technological environments. Web data is often collected in excess of any hypotheses that an analyst wants to test at the moment, because you cannot retroactively collect web visit data. 

Compared to the chemist’s experimental data, web data is messy. It’s easy to accidentally collect compromised data because there are many ways to be wrong.

Here are three of the most common reasons your web analytics data might be inaccurate.

You have errors in your JavaScript

Every client-side web analytics tool relies on JavaScript. If the JavaScript isn’t working, the data is likely wrong.

There are two main ways JavaScript errors mess up your data. The rudest errors are syntax and semantic errors, which stop the execution of the analytics code, so data is lost. These errors ruin data integrity and can potentially affect the user experience. Luckily, this type of error is easy to spot. They can usually be found in the browser’s console when testing, and tag management systems like Google Tag Manager or Ensighten can catch such errors and provide reports.

The other, more elusive type of JavaScript error is the logic error. An analytics implementation might appear to be free of bugs, but still record bad data due to incorrect conditional logic. For example, imagine a site at www.coolsite.com that has a mirror site translated to French at www.coolsite.com/fr. The web analytics code has logic that checks the first two letters of the URL path to determine language. If the path starts with “fr,” the language is recorded as French. Unfortunately, Cool Site has a page at www.coolsite.com/freestuff. This page is in English, but the analytics logic will incorrectly record the language as French.

Another common logic error involves looking for data in the wrong place. Imagine a case where there has long been code that looks for an element with the ID “userState” and records its value. However, a month ago, someone felt like that ID looked ugly and renamed it to “state.” The web analysts did not know this change was going to happen, so the analytics code was not changed. In this case, the state data is lost.

Timing is off

Uncontrolled timing is one of the most common reasons that web analytics data is not captured correctly. It’s also one of the most difficult issues to find and fix.

Timing issues involve network requests or asynchronous JavaScript firing in an unintended order. Many timing issues manifest as JavaScript errors that only happen some of the time. For instance, a file with tracking logic and a file with data might be loading at the same time, and if the file with logic happens to load first, it won’t find any data to track.

It’s a good practice to load analytics code asynchronously, so it doesn’t affect how fast the user sees the page load. However, asynchronous execution means you can’t control precisely when the code will run.

Some timing issues may only occur when reloading a page that has already been visited. Browsers cache resources that have been loaded before, so they load instantaneously the next time. This might result in a different load order than the original page load.

You can’t control your users

Your users don’t care about your data. They have no obligation to use the latest browser, nor to browse your site in ways that you intend. They have no reason to behave in a way that makes data collection easy.

Here are some examples of user behavior that can muddy up your data:

  • Peter is having connection troubles while browsing on his phone. He gets to your site, but leaves the page before the analytics code loads.
  • Gerald is concerned about privacy, so he blocks third-party cookies, and he blacklists common analytics servers.
  • Sally is still using Internet Explorer 8. Her visit is not tracked because of JavaScript errors.
  • Stephen visits your site ten thousand times (Stephen might be a robot).
  • The Adams family has just one computer, which they share. They are tracked as one visitor, but they are multiple people with differing behaviors.
  • Megan visits your site on her laptop and her phone. Megan is tracked as two visitors, but she is one person.
  • Danny clears his cookies. He is tracked as a new visitor next time he visits your site.
  • Stacey knows about web analytics and thinks it would be fun to mess with your data, so she inserts a fake campaign ID when browsing your site.
  • Gretchen first comes to your site through an email campaign. She bookmarks the URL from the email, so every time she comes to your site, it looks like she’s coming through the same email campaign.

Summary

Most web analytics data collection issues can be avoided with enough expertise and attention to detail. JavaScript errors and timing errors can be eliminated through thorough testing and debugging. However, user behavior cannot be controlled. We have to accept that measuring human behavior will never be as accurate as measurement in hard sciences.

It isn’t possible to account for everything a user might do. For this reason, it is impossible for web analytics data to be perfectly accurate. It’s common to find differences between data sources – Adobe Analytics might report different numbers than Google Analytics, which might report different numbers than your web app’s backend database.

Imperfect data isn’t useless, though. Far from it! As long as the data source is consistently off in the same way – as long it is precise – it can be used to detect real effects. In web analytics, precision is more important than accuracy.

You should also consider working with experienced web analysts and developers who have created solutions to these common data collection issues. Stay tuned for the next post in this series explaining how to fix these issues.

Written By


Evolytics

This post is curated content from the Evolytics staff, bringing you the most interesting news in data and analysis from around the web. The Evolytics staff has proven experience and expertise in analytics strategy, tagging implementation, data engineering, and data visualization.