How to Minimize Type I and Type II Error

Understanding risk in AB Testing

There’s much more subjectivity in statistics than we give it credit for. All of the mathematical variables put in place are based upon assumptions. One of the most important assumptions made during the A/B Testing process is the risk tolerance. 0% risk is an impossibility in the world of statistics, and, generally speaking, less risk equates to more time and money. Plus, you’re generally asking your statistician to predict the future, so it’s unrealistic to expect a 100% guarantee. It’s a careful balancing act between minimizing risk and maximizing resources.

If you’re not familiar with statistics, the error terms being thrown around can be confusing.

  • Alpha Error = Type I Error | Most commonly referred to as a false positive
  • Beta Error = Type II Error | Most commonly referred to as a false negative

Let’s detour for a moment to discuss how a hypothesis is designed.

If you’re launching an A/B Test and want to see Version B convert better than Version A, your null hypothesis would be:

  • A = B, or A ≥ B.

The reason? In statistical terms, it is a stronger showing of evidence to reject than to accept because acceptance is merely a failure to find sufficient evidence to say otherwise. You can reject, or you can fail to find sufficient evidence to prove otherwise.

If you were to reject the hypothesis above, you are forced to believe its opposite, or the Alternative Hypothesis, is true:

  • A < B

Understanding Error, utilizing the example above

Type I Error, or rejecting when you should not have, occurs when, by chance, your evidence shows you that A < B. You would spend resources to roll out the new version of the website, version B, assuming it converts better than your current version, Version A. Unfortunately, if you made a Type I Error, you would be incorrect, and conversions may either remain static, or, in a worst case scenario, conversions would decrease.

Type II Error, or failing to reject when you should have, occurs when, by chance, your evidence shows you that A ≥ B. You would not spend the resources to roll out the new, better performing version of your website, when, in fact, it would improve your conversions.

It isn’t difficult to quickly discern the negative business effect of making one of these errors.

How to Minimize Type I Error Risk

The term confidence level is associated with Type I Error. Being 95% confident means that you are allowing a 5% chance for a false positive, or one in 20. If the risk associated with a false positive is very higher, you may increase your confidence level to 99%.  This must be balanced with business resources, as a higher level of confidence means increasing the sample size or increasing the level of difference necessary, which puts you at risk for failing to find small gains.

It is not uncommon for some conversion rate optimization programs to lower the confidence level to something like 85% or 90%. After all, those are still good odds, right? Perhaps, but by dropping your confidence level from 95% to 90%, your chance of error doubles from one in 20 to one in 10. The likelihood of making the wrong call should not be taken lightly here.

Evolytics offers free online calculators to evaluate necessary sample sizes and confidence levels.

How to Minimize Type II Error Risk

A lesser known term associated with hypothesis testing is power. In fact, if you’re using an easy online calculator and not a statistics tool such as R, you likely don’t even need to enter the power assumptions. Like confidence interval, power denotes the level of risk you are willing to accept, but it is associated with Type II Error. 80% is an accepted standard in A/B Testing. 

Type II Error risk can go one of two ways. The best method to control for Type II Error is by planning ahead and basing your sample size on the level of risk you’re comfortable with. There are only two ways to control for Type II Error:

  1. Increase risk for Type I Error
  2. Increase sample size

Understanding the potential costs of making a mistake here will be valuable. For instance, if your monthly online revenue is $100,000, and you optimize the experience by 2%, but fail to recognize that due to Type II Error and do not roll out the experience, you are leaving $24,000 on the table over the course of a year.

Written By

Krissy Tripp

Krissy Tripp, Director of Decision Science, strives to empower her clients to make use of their data, drawing from a variety of disciplines: experimentation, data science, consumer psychology, and behavioral economics. She has supported analytic initiatives for brands such as Sephora, Intuit, and Vail Resorts.