This post covers: why most a/b tests fail, what to test first (priority order), how to calculate sample size, setting up tests on shopify, reading results without fooling yourself, common testing mistakes.
1. Why Most A/B Tests Fail
A/B testing sounds straightforward. Make two versions, split traffic, see which wins. But in practice, most ecommerce A/B tests produce inconclusive results or misleading data. And the reason is usually one of three things.
First, not enough traffic. If your landing page gets 500 visits per week and you're testing a change that might improve conversion by 10%, you need about 6-8 weeks to reach statistical significance. Most people check after 3 days, see a difference, and declare a winner. That's not testing. That's guessing.
Second, testing the wrong things. Button color tests, font changes, and minor copy tweaks almost never produce meaningful results because the impact is too small to detect at normal traffic levels. You need to test things that could actually change behavior, like headlines, offers, or page layout.
Third, changing too many things at once. If you test a new headline, a new hero image, and a new CTA button in the same variant, and it wins, you have no idea which change drove the improvement. Test one variable at a time unless you're running a multivariate test with very high traffic.
2. What to Test First (Priority Order)
Not all tests are equal. Some elements have a much higher potential impact than others. Here's a priority list based on what we've seen move the needle most for ecommerce landing pages.
- Headline: This is the first thing every visitor reads. A headline test can swing conversion rates by 20-40%. Start here.
- Hero image or video: Lifestyle imagery vs product shots, or static image vs video. Visual changes tend to produce large, measurable differences.
- Offer structure: "30% off" vs "Buy 2 get 1 free" vs "Free shipping + 15% off." Different offers attract different psychology.
- Social proof type and placement: Reviews above the fold vs below, star ratings vs full testimonials, UGC vs professional photos.
- CTA copy and placement: This is lower impact than the items above, but still worth testing once you've handled the bigger variables.
- Page length: Short (above-the-fold focus) vs long (detailed benefits, FAQ, multiple sections). This varies dramatically by product type and price point.
The general rule: test things that visitors see and interact with first. The higher up the page and the more visible the element, the bigger the potential impact.
3. How to Calculate Sample Size
Before you start a test, you need to know how much traffic you need. Running a test without enough traffic is like flipping a coin 5 times, getting 4 heads, and concluding the coin is biased. You need enough data to be confident in the result.
The quick math for ecommerce: if your current conversion rate is 3% and you want to detect a 20% improvement (lifting it to 3.6%), you need roughly 10,000 visitors per variation. So 20,000 total visitors split between control and variant.
If your page gets 1,000 visitors per week, that's a 20-week test. That's too long. Either increase traffic (run more ads temporarily), or test bigger changes that could produce larger lifts. A change that might improve conversion by 50% (like a completely different headline approach) needs far fewer visitors to reach significance.
Use a sample size calculator (VWO, Optimizely, and Evan Miller all have free ones) before starting any test. If the required sample size means running the test for longer than 4-6 weeks, reconsider what you're testing.
4. Setting Up Tests on Shopify
There are several ways to run A/B tests on Shopify landing pages, depending on your page builder and budget.
Shogun (built-in): Shogun's higher-tier plans include A/B testing. You duplicate a page, make changes to the variant, and Shogun splits traffic automatically. This is the easiest option if you're already using Shogun as your page builder.
Google Optimize replacement: Google Optimize shut down in 2023. The closest free alternative is using Google Ads experiments, which let you split traffic between two different landing page URLs. This works but requires setting up both pages as separate URLs.
VWO or Convert: Dedicated A/B testing tools that work with any Shopify page. VWO starts around $200/month but gives you visual editing, statistical analysis, and heatmaps. Worth it if you're spending $10K+/month on ads and testing regularly.
Manual URL split in ad platforms: The simplest (free) method. Create two landing pages with different URLs. In your Google Ads or Meta campaign, split traffic 50/50 between them. Track conversions for each URL separately. Not as precise as dedicated tools, but works for initial tests.
5. Reading Results Without Fooling Yourself
Here's where most people go wrong. They run a test for a few days, see that variant B has a 4.2% conversion rate vs control's 3.8%, and conclude that B wins. But with only 500 visitors per variant, that difference is well within random noise.
What to look for:
- Statistical significance of 95% or higher. This means there's a 95% chance the observed difference is real, not random. Most A/B testing tools calculate this automatically.
- Consistent direction over time. If variant B was winning on day 3 but losing on day 7, the test isn't conclusive yet. Look for a stable trend.
- Practical significance. A 0.1% improvement that's statistically significant isn't worth implementing. Consider whether the improvement is large enough to matter to your business.
Also watch for the "peeking problem." Every time you check results mid-test, you increase the chance of seeing a false positive. Decide your sample size upfront, wait until you hit it, then look at results. Checking every day and stopping early when things look good inflates your false positive rate.
6. Common Testing Mistakes
- Testing too many things at once: Unless you're running a multivariate test with very high traffic, stick to one change per test.
- Not segmenting results: A test might show no overall winner, but when you segment by device, variant B wins on mobile and loses on desktop. Check device-level results.
- Testing during unusual periods: Don't start tests during Black Friday, product launches, or other anomalous traffic periods. The results won't generalize to normal conditions.
- Declaring winners too quickly: The minimum is usually 1-2 full business cycles (7-14 days) even if you hit sample size sooner, because day-of-week effects are real.
- Not documenting results: Keep a testing log. What you tested, the hypothesis, the result, and what you learned. Patterns emerge over time that inform future tests.
7. When to Stop a Test
Knowing when to stop is just as important as knowing when to start. There are three valid reasons to end a test.
You've reached significance. The test hit your pre-determined sample size and one variant is clearly winning at 95%+ confidence. Implement the winner.
It's been too long. If you've been running a test for 6+ weeks and still can't reach significance, the difference between variants is probably too small to matter. Call it inconclusive and move to a bigger test.
One variant is clearly losing. If variant B is converting at 50% of control after 2,000 visitors, you don't need to wait for statistical significance. A dramatic loser is obvious early and you should stop it to protect revenue.
8. Building a Testing Roadmap
Random testing is better than no testing, but a structured approach produces faster results. Here's a simple quarterly testing roadmap for ecommerce landing pages.
Month 1: Test 2-3 headline variations on your highest-traffic landing page. This gives you the biggest sample size and the highest-impact element.
Month 2: Take the winning headline and test hero image/video variations. Then test offer structure if applicable.
Month 3: Test page layout, social proof placement, and CTA positioning. These are smaller changes but still meaningful once the big elements are dialed in.
Repeat each quarter. The industry benchmarks show that top-performing pages convert at 2.5-3x the median. That gap doesn't come from one test. It comes from compounding small improvements over 6-12 months of disciplined testing.
For more on where to focus your conversion efforts, see our CRO services page.
Frequently Asked Questions
At minimum, 1-2 full weeks to account for day-of-week patterns. The actual duration depends on traffic volume and the size of the difference you are trying to detect. Use a sample size calculator before starting, and do not end the test early just because one variant looks like it is winning.
For a test that can detect a 20% improvement in conversion rate, you typically need 5,000-10,000 visitors per variant. If your page gets fewer than 1,000 visitors per week, focus on testing bigger changes that produce larger, more detectable differences.
Both, but separately. Test ad copy in Google Ads using ad variations or experiments. Test landing page elements using a dedicated tool or URL split. Testing both at the same time makes it impossible to isolate which change caused the result.
Shogun has built-in A/B testing on higher plans. VWO and Convert work as third-party tools on any Shopify page. For a free option, you can split traffic manually in Google Ads or Meta between two different landing page URLs and compare conversion rates.
Find Out What's Costing You
COREPPC's free audit checks your Google Ads and Meta accounts in 60 seconds. Get your performance score, spot wasted spend, and see exactly where to improve.
Start Free Audit


