How to Successfully Run Multiple A/B Tests at the Same Time

about

Learn how to successfully run multiple A/B tests concurrently without encountering bad data or a poor user experience.

the author

Adam Ritchie

Ecommerce Contributor

share this post

Running multiple A/B tests at the same time is a common practice in the ecommerce industry. Whenever you go to a major online store like Amazon or eBay, you’re likely participating in dozens of A/B tests (if not more) as a visitor without even knowing it.

Jeff Bezos said it himself: “Our success at Amazon is a function of how many experiments we do per year, per month, per week, per day…”

Any technique that is critical to the success of the biggest ecommerce site in the world is certainly worth trying out for your own store — but you should be aware that there is some risk involved in running A/B tests simultaneously. If you’re not careful, you could end up generating poor user experiences and misleading test results.

In this comprehensive guide, we’ll go over exactly how you can benefit from this strategy while minimizing the associated risks as much as possible.

Isolated Tests vs. Overlapping Tests
This Is What It’s Like When Tests Collide
Multiple A/B Tests vs. A/B/C Testing
How to Streamline Your Testing

Optimize your storefront without breaking the bankShogun A/B Testing offers an easy-to-use tool for conducting experiments on your Shopify store, and it’s much more affordable than most of the other options available.Get started now

Isolated Tests vs. Overlapping Tests

First, let’s start with a quick definition of A/B testing, just in case you’re not too familiar with the concept.

An A/B test is a type of experiment aimed at improving user experience. It involves publishing two different versions of something on your store, such as a page, page template, or theme.

Typically, the experiment has a control, which is the version that existed on your site before the test (you can think of this as the “A” in “A/B test”), as well as a variant that has the changes you want to try out, such as a new CTA button style or a rewritten product description (the “B” in “A/B test”). Visitors are randomly assigned to one of these two variants, and you can then compare how they perform against each other to determine whether you should fully roll out the new variant or revert back to the original.

A/B tests involve publishing two versions of your live storefront.

There are two ways to run multiple A/B tests at the same time:

Isolated tests: If a visitor is involved in one test (regardless of whether they’re assigned to the control or the experimental variant), then they will be excluded from any other test that you happen to also be running.
Overlapping tests: Experiments are run without any regard to how many tests a visitor is involved in, which means the same visitor could be exposed to multiple tests simultaneously.

Each approach comes with its own set of pros and cons.

The advantage to isolated tests is that they produce more accurate results, as you won’t need to worry about how the effects of one test might interfere with the results of another. Each visitor is in their own silo — outside of the one test they’re assigned to, it’s like none of the other tests exist at all.

The disadvantage, though, is that isolated tests require more visitors than overlapping tests, which means it will take more time for you to collect enough data to conclude your experiment. In fact, running multiple isolated tests simultaneously requires just as much time as running the same number of tests individually, one after the other.

To demonstrate, let’s say you want to run two A/B tests on a page that gets 5,000 visits per month.

Running isolated tests would require you to split that audience into two separate buckets, which would each receive 2,500 visits per month. If it also takes about 2,500 visits to get statistically significant results, you would then need to wait one month to complete both tests — which is the same amount of time it would take to avoid interference by just running the tests sequentially rather than simultaneously (in that case, it would take two weeks to complete your first test and then another two weeks to complete your second test, adding up to one month total).

So, if you’re not saving any time, what’s the point of running multiple isolated tests simultaneously?

There are some situations where this approach makes the most sense. If you’re testing out a big, important change to your store, like a full homepage redesign, then the extra accuracy offered by isolated tests is probably worth the additional time it takes compared to overlapping tests.

And if you’re also concerned about seasonal effects, then simultaneous isolated tests may very well be a better option than sequential tests (e.g., if your tests are going to run from early November into late November, then the second of two sequential tests may get a misleading bump in performance due to Black Friday/Cyber Monday sales, while two isolated tests that run over the same period of time would get the same amount of exposure to this effect).

As for overlapping tests, the main advantage here is that you can run three, four, five, or as many experiments as you want over the same amount of time that it would take to complete a single pair of isolated tests.

This is highly beneficial for your store. The more you’re able to test, the more insights you’ll be able to gain about what it takes to reduce your bounce rate, increase your conversion rate, make more sales, and reach whatever other goals that you may have for your store.

A/B tests can help you reach a variety of different ecommerce goals.

But then there’s the drawback with overlapping tests, which is that they can interact with each other, increasing uncertainty.

These interactions are often so minor that they’re not going to have any real impact on your results (e.g., visitors who see certain versions of a collection page and an FAQ page may be influenced in some esoteric way compared to visitors who saw other versions of the same pages, but it likely doesn’t matter much). In other words, the benefits of running overlapping tests usually outweigh the risk of potential interference.

This Is What It’s Like When Tests Collide

While the effects of interference typically aren’t serious enough to bother accounting for in A/B tests, it’s important to note that this isn’t always the case.

For example, imagine you’re testing some changes to your homepage while also testing some changes to your product page template, and for both of these tests you’re looking at your store’s overall conversion rate as the main indicator of success.

So, you run these tests and see that your conversion rate goes up. But should you attribute this improvement to the changes you made to your homepage or the changes you made to your product pages? There’s no way to pull the results apart and figure out the true cause, which means these tests didn’t produce any actionable insights for you. It was all just one big waste of time. This can happen when you run overlapping tests on different stages of the same sales funnel.

Here’s another hypothetical to consider — let’s say that, as part of your content marketing strategy, you’re offering visitors a free ebook download at the end of each of your blog posts. In Text X, you want to experiment with moving this download button from the body of these posts to your main menu instead. But then, in Test Y, you also want to try hiding your main menu on most pages in order to make your site more streamlined.

See the problem? When users are exposed to both of these experiments, some won’t be able to see any mention of the free ebook download at all (that is, if a visitor was assigned to both the download-button-in-menu variant in Test X and the hidden-menu variant in Test Y).

If the metric you’re tracking in Test X is ebook downloads, then the performance of the download-button-in-menu variant may look much worse than it would have been if there were no such interference, leading you to come away with highly inaccurate conclusions from your experiment.

Situations like these demonstrate why you must think well about how you set up and measure your tests.

Multiple A/B Tests vs. A/B/C Testing

In addition to isolated A/B tests and overlapping A/B tests, there’s one other option in your optimization toolkit that you should consider using: the A/B/C test.

Instead of testing just one new variant like in an A/B test, an A/B/C test involves testing multiple new variants against the control in your experiment. For example, if the control has a black CTA button, you might test it against one new variant with a red CTA button as well as another new variant with a blue CTA button.

This is usually easier than running multiple A/B tests, as you only need to manage a single experiment. And visitors assigned to a variant in an A/B/C test will only be exposed to that one variant, so interference is less of a concern.

A couple notes, though. Like isolated tests, you should expect A/B/C tests to take more time to complete than overlapping tests (after all, splitting your visitors into more buckets means you’re dividing your traffic by a higher number, so more visitors total will be needed for each variant to reach statistical significance). And A/B/C tests only work as a substitute for multiple A/B tests if you were testing for the same goal, duration, and page/template/theme anyway — otherwise, you’ll need to stick with multiple tests.

How to Streamline Your Testing

Now that we’ve established all the potential issues with running multiple A/B tests, let’s go over a few practical tips that will help you prevent these problems:

Anticipate collisions by thinking a few steps ahead: You look both ways before crossing the street, right? It helps to be just as careful when running multiple A/B tests — instead of moving forward blindly, take some time to picture all the possible implications that one test might have on another. Any potential interaction effects that would disrupt user experience are usually fairly obvious, and once you’re aware of these connections you can make an informed decision about whether overlapping, isolated, or A/B/C testing would be best. A little forethought now can save you from a lot of confusion and wasted time down the road.
Encourage communication between any teams that can independently run tests: Large organizations may have multiple teams that have the capability to run their own A/B tests. In that case, protocols should be in place requiring these teams to make it known to all relevant parties when they start a test and which elements on the site will be affected. Without this communication, each team could be dealing with all kinds of interference that they’re not even aware of.
Consider staggering your test launches: If you just can’t stomach the risk of interference that comes with overlapping tests, one way that you can still partially benefit from this strategy is by staggering the launch of each test. For example, you could launch Test X in Week 1 and Test Y in Week 2 while planning to conclude Test X in Week 3 and Test Y in Week 4. This will still save you some time compared to isolated tests while also providing a couple baselines that you can use to detect potential interference (e.g., if you notice a sudden change in Test X’s performance at the same time that Test Y was launched, you’ll know that there’s a problem).
Make your goals as specific as they can be: We previously went through an example of how, if you run one test on your homepage and another on your product pages and only look at the overall sales conversion rate for both tests, you run into attribution issues. But what if for the homepage test you looked at how many people clicked through to the next page instead? This lets you determine the success of your homepage test and product page test separately, so you’ll be able to tell whether each test was successful or not. Whenever you can make your goals more targeted like this, you should.

Overall, the effectiveness of your store’s A/B testing strategy is determined by the product of three factors: how many tests you run, the percentage of tests where the new variant wins, and the individual impact of each successful experiment.

Overlapping tests allow you to address the first part of that equation. While there are some circumstances that call for isolated tests or an A/B/C test, trying to prevent every possible instance of interference is just going to slow you down.

You should always try to keep your rate of testing high, as this is the only way you’ll be able to make a substantial dent in the stack of ideas that you surely have about how to improve your store — otherwise, many of these ideas may never even see the light of day. And the faster you uncover new insights, the faster your store will be able to benefit from them.

partners

How to Successfully Run Multiple A/B Tests at the Same Time

about

the author

share this post

Isolated Tests vs. Overlapping Tests

This Is What It’s Like When Tests Collide

Multiple A/B Tests vs. A/B/C Testing

How to Streamline Your Testing

You might also enjoy

Shopify Sections: What They Are & How to Add Them to Your Store

The Easy Way to Add a Custom HTML Section to Shopify

Adding Reusable Dynamic Sections to a Page in Shopify

Get started for free