It’s all fun and games until your A/B Test is flat-tastic.

Unfortunately, the ideal isn’t always real, and inconclusive A/B Test results happen. There are times when you look at your completed test data and don’t know how to make sense of it. Maybe your KPIs were all flat, or you had a mixed bag of positive and negative results. What do you do next? 

Inconclusive A/B Test Results Main Image

Ideally, your A/B testing process looks like this: When a test hits its sample size and you see the expected lift you were hoping to see, you tie a bow on it, call it a win, and roll it out to your audience. The implementation of said test is now benefiting the company, the testing team can move forward with its strategy, and you have some conclusive measurements.

I’m going to talk about how to define an inconclusive A/B Test, what to do if your test appears to be inconclusive, and how to avoid having an inconclusive test in the future. After all, it’s all fun and games until your A/B test is flat. 

What defines an inconclusive A/B Test?

Even in an ideal data world it can still be difficult to conclude a test. Let’s say, this time the experiment hits the sample size required, but there is an unclear winner. An unclear winner is defined as a less than desired lift if any at all, and no benefits as you funnel down to secondary and behavioral metrics. 

What should you do? The pressure is on. Stakeholders are looking to you to be the expert and take leadership. Do you roll it out to 100% and hope for the best? You already put time, energy, and possibly development resources into it right? Do you end the test and try something new, pretending like this never happened? Better to be safe than sorry right? Or do you start over and run it again hoping this time will be better?

Well, all of those could be possible options but first, we have to deep dive into the data to see what action we need to take. 

How do you unlock information from inconclusive A/B Test results?

There are four primary options to take with your inconclusive test data. 

1. Segment your data: 

Breaking up your test by various segments may help you find the golden nugget you are looking for. Some examples of segmented data could be: 

  1. New vs returning visitors
  2. Desktop vs. mobile device
  3. Region
  4. Affinity group 
  5. Acquisition channel 
  6. Age, gender, or other demographic data
  7. Guest or logged in 

Seeing a single segment hit statistical significance can be rare if the total population has flat results. The more segmentation that occurs, the smaller the population becomes. Checkout our A/B testing Calculator to see if your test hit significance, or our Chi Squared Calculator to test for statistically significant performance across segments.. This is a great way to see how one segment reacts to an experience vs another.

Mini Case Study 

An ecommerce site was changing its homepage CTAs. We hit sample size, and the primary KPI was indexing at 105 without confidence. When we segmented the data by new and returning visitors, we saw that new visitors were down, while returning visitors were up, but with 95% confidence. This caused us to iterate by backtesting the experience only to returning visitors while showing a different experience to first-time visitors.

If you do not see any promising data after breaking it down by segments, continue moving forward.

2. Removing multivariate impacts test data:

Isolate the visitors that were in multiple tests, or if possible target and omit a step in your data flow that another test was measuring. You’ll know where to start by asking yourself: 

Are there multiple tests running? You can take a swim lane approach, but this can be limiting. You can also allow for multiple tests at any given time, assuming the impact will still be random, equal, and representative across recipes. Of course, theoretical math doesn’t always come to fruition in the real world, hence the need to measure the multivariate impact of experiment crossover. 

An example of this would be if you had two tests running simultaneously with the primary KPI as CR. You can isolate for a test by creating a confusion matrix.This will ensure that the lift or drop you are experiencing is directly attributed to test #1. One issue with this is the division of population.   

Experiment 1 | ControlExperiment 1 | Version B
Experiment 2 | Control10.1%9.8%
Experiment 2 | Version B12.2%11.0%

3. Move Upstream:

After looking at the various segments and removing crossover test data, you still may not know if the test is a success or failure. Conversion rates still appear flat. At this point you may want to look upstream to see how users were interacting with the larger part of your funnel. There is a possibility that the KPI being analyzed could be too far down the funnel to see the statistical impact.

An example of this would be an optimization in the hero of the home page and the KPI being monitored is conversion rate. If a store’s funnel is: 

Home page > Category > Product > Checkout > Order

Looking at the final step of the funnel may be too narrow of focus. Take a look at a metric that is a step higher in the conversion funnel. Did more people reach a product page even though conversion saw no lift? This is still valuable information. You know that your homepage hero can be working harder for you, but you also know that getting more users into the funnel doesn’t mean they’re as qualified, and they may get stuck on other steps. This is where you can pivot your roadmap to begin optimizing the next step in the funnel and seeing if you can move the needle. 

4. Remove Biases: 

Were there biases in the timeframe of your A/B test? Was there a promotion running during the time period of the test? Was there a big marketing campaign driving specific user segments to the page? Did you test across two different buying seasons?. A time we saw this occur was when visitors who historically spent more per purchase were unevenly distributed into the variant group. Neglecting to see this type of bias could result in rolling out a test based on bad data.  If you realize you have tested with a biased audience group, some tools may allow you to remove biased data segments, but more than likely it may be best retesting a test without biases. 

How to avoid Inconclusive data in the future?

Find your purpose

Define a purpose for the test. Are we trying to mitigate risk, drive revenue, gain insight, reduce cost, or improve customer satisfaction? A great test may include one, multiple, or all of these objectives, but it helps to have a primary objective. For example, will you roll out an experience that is revenue flat if it receives better customer NPS scores? 

One KPI – just one

Once we know what our goal is for a particular test, we need to define our primary KPI. This is a critical step that can be easily overlooked during the measurement planning of a test. One thing to remember, tracking everything is like tracking nothing. Having a Primary KPI keeps analysts, marketers, and product owners focused on the primary goal and can make a result based on a single metric attainable. 

We recognize that it’s unrealistic, and potentially irresponsible to only look at one metric. We’re not advocating that. What we are advocating is one primary metric – your true north star. You can still have secondary metrics, or those that you monitor to do no harm and even behavioral metrics, those that you monitor to better understand your customer’s experience, but they’re not considered in your rollout decision-making. 

Make use of swim lanes

Utilize swim lanes if traffic volumes allow, and be aware of other changes that may happen during the course of the test? Avoiding crossover tests will keep the data as clean as possible, and if they’re unavoidable, at least measure the multivariate impact of multiple experiences. Oftentimes this may be unrealistic to segment traffic into such small samples. For this reason, at Evolytics we often discuss the pros and cons of this option specific to your business situation. 

Build a testing roadmap

A testing roadmap can help plan out tests in the future for the foreseeable future. It will clearly lay out the order and sequence of upcoming tests. To learn more about how to avoid inconclusive tests by roadmapping watch Unlocking Information from Inconclusive A/B Test Results, a webinar presented by Senior Manager of Decision Science Kenya Davis in partnership with VWO.

A/B Testing can be exciting 

You really never know until you test something, what you’re going to get. However, you can maintain control even when results surprise you through preparation and planning.

Written By


Evolytics

This post is curated content from the Evolytics staff, bringing you the most interesting news in data and analysis from around the web. The Evolytics staff has proven experience and expertise in analytics strategy, tagging implementation, data engineering, and data visualization.