Website traffic is a difficult source of data.
Consider the way a chemist collects data while testing a chemical reaction. They use known chemicals dissolved at precise concentrations and mix a precise volume of each into sterilized flasks. They weigh the resulting precipitate on a precise scale. The chemist has full control over the inputs in this experiment, and the outputs are measured very carefully. They collect only the data they need to test their hypothesis about how the chemicals will interact. If they mess up, they can measure the same reaction again.
Now consider how a web analyst collects data. Web data is collected automatically, en masse, from many different “people” in many different technological environments. Web data is often collected in excess of any hypotheses that an analyst wants to test at the moment, because you cannot retroactively collect web visit data.
Compared to the chemist’s experimental data, web data is messy. It’s easy to accidentally collect compromised data because there are many ways to be wrong.
Here are three of the most common reasons your web analytics data might be inaccurate.
Another common logic error involves looking for data in the wrong place. Imagine a case where there has long been code that looks for an element with the ID “userState” and records its value. However, a month ago, someone felt like that ID looked ugly and renamed it to “state.” The web analysts did not know this change was going to happen, so the analytics code was not changed. In this case, the state data is lost.
Timing is off
Uncontrolled timing is one of the most common reasons that web analytics data is not captured correctly. It’s also one of the most difficult issues to find and fix.
It’s a good practice to load analytics code asynchronously, so it doesn’t affect how fast the user sees the page load. However, asynchronous execution means you can’t control precisely when the code will run.
Some timing issues may only occur when reloading a page that has already been visited. Browsers cache resources that have been loaded before, so they load instantaneously the next time. This might result in a different load order than the original page load.
You can’t control your users
Your users don’t care about your data. They have no obligation to use the latest browser, nor to browse your site in ways that you intend. They have no reason to behave in a way that makes data collection easy.
Here are some examples of user behavior that can muddy up your data:
- Peter is having connection troubles while browsing on his phone. He gets to your site, but leaves the page before the analytics code loads.
- Gerald is concerned about privacy, so he blocks third-party cookies, and he blacklists common analytics servers.
- Stephen visits your site ten thousand times (Stephen might be a robot).
- The Adams family has just one computer, which they share. They are tracked as one visitor, but they are multiple people with differing behaviors.
- Megan visits your site on her laptop and her phone. Megan is tracked as two visitors, but she is one person.
- Danny clears his cookies. He is tracked as a new visitor next time he visits your site.
- Stacey knows about web analytics and thinks it would be fun to mess with your data, so she inserts a fake campaign ID when browsing your site.
- Gretchen first comes to your site through an email campaign. She bookmarks the URL from the email, so every time she comes to your site, it looks like she’s coming through the same email campaign.
It isn’t possible to account for everything a user might do. For this reason, it is impossible for web analytics data to be perfectly accurate. It’s common to find differences between data sources – Adobe Analytics might report different numbers than Google Analytics, which might report different numbers than your web app’s backend database.
Imperfect data isn’t useless, though. Far from it! As long as the data source is consistently off in the same way – as long it is precise – it can be used to detect real effects. In web analytics, precision is more important than accuracy.
You should also consider working with experienced web analysts and developers who have created solutions to these common data collection issues. Stay tuned for the next post in this series explaining how to fix these issues.