Measuring Success in A/B/n Testing Programs

Tracy Burns-Yocum

A/B Testing Data Stories

The Whole is Greater than the Sum of its Parts

The results of your latest A/B/n test are in-hand, they’re statistically significant, and have reached power—you consider the test a success! But how does this success fit in with the rest of the quarterly tests you ran? Tests from throughout the year? It’s important to outline and track a combination of metrics for a holistic evaluation of your experimentation program.

A Mix of Metrics is Key

Three key metrics to use for evaluating your program are Test Velocity, Success Rate, and Agility.

Test Velocity

Test Velocity is the number of tests your experimentation program runs in a given epoch (i.e., a year). It would be simple, yet naive, to increase this metric and only this metric. You shouldn’t test just for testing’s sake. Instead, deploy tests backed by what you see in the data. This metric will increase as your experimentation program matures.

Success Rate

Success Rate, also known as “Win Rate,” is the number of experiments to reach statistical significance with a positive outcome out of all the experiments run in a given time period (i.e., per quarter). This metric provides insight into the quality and balance of ideas you’re testing. If the rate is too low, your team is likely not spending enough time on research; too high, and the opposite problem exists. Interested in reading more about what your program’s optimal Success Rate should be? Check out our blog post on the subject: “How to find the Optimal Success Rate for your Testing Program.”

Agility

Agility is measured by the number of days an experiment takes from hypothesis to a productionized test on your website. Essentially, how quickly can you pivot from idea to a live test? Agility is an indicator of two related components: (1) your development and production pipeline; and (2) the amount of time spent researching the hypothesis. Much like test velocity, it would be easy to increase this metric by quickly pushing out meritless, bug-laden tests. But again, it’s inadvisable to do so.

How to Implement Program Health Metrics

A significant amount of time in analytics is spent tracking metrics and sharing them with a broader team. Experimental program metrics are no different. Take these three steps to set the benchmarks you want your program to meet:

Evaluate your program in its current state.
Once you have data to inform your decision, strategize how to improve testing velocity, success rate, and agility.
Finally, add the metrics to a dashboard your whole team can access. This dashboard should be the pulse of your program. It can give you insight into what type of ideas (i.e., conservative or bold) you’re churning out, how quickly you’re generating these ideas, and if you’re spending an inappropriate amount of time in the research phase.

When you go to a medical appointment, your doctor creates a holistic view of your health by checking your heart rate, listening to your lungs, and taking your temperature. You should take a similar holistic approach to examining the performance of A/B tests. Leveraging multiple metrics and applying them in tandem will keep your testing and experimentation program healthy and on track for success.

Before You Go

If your testing program is ready to evolve or is unsure where to turn with the sunset of Google Optimize later this year, get in touch with us. Evolytics has a team of A/B Testing and Experimentation experts who can guide you through the process of migrating your testing program to a new testing platform that fits your needs and stack requirements. With the proper implementation, our recommended testing platforms are capable of driving powerful insights for your entire testing program—from tracking win rate and testing velocity to improving oversight across projects with intuitive and robust reporting solutions.

Contact Us

Written By

Tracy Burns-Yocum

Tracy Burns-Yocum is Analyst I on our Experimentation & Strategy team. She conducts analyses and identifies trends that inform strategic business decisions for clients. She is Google Analytics and Amplitude certified, with additional training in Tableau, SQL, and Python. Tracy also has a background in research to understand human behavior.

Read Previous Post

Blog

Measuring Success in A/B/n Testing Programs

Tracy Burns-Yocum

The Whole is Greater than the Sum of its Parts

A Mix of Metrics is Key

Test Velocity

Success Rate

Agility

How to Implement Program Health Metrics

Before You Go

Written By

Tracy Burns-Yocum