Get A/B testing right
1. Calculate your test sample size upfront using a sample size calculator, and run a preliminary test to check for significance.
Ignore your test results until you have at least 350 conversions per variation, depending on how big your sample size needs to be. Don’t stop your test until you reach your conversion threshold. To analyze your test results cross-segment, you need the same amount of conversions across all segments.
2. Run a statistical significance test online using an A/B test calculator, and continue testing even after you reach a statistical significance of 95%.
The p-value in your significance test does not tell you the probability of A being better than B. Use Bayesian statistics to identify the likelihood of B variation being better than A.
3. Use a calculator like Evan Miller's sample size calculator to determine the required sample size for your test to be significant.
To achieve this: Add the conversion rate of your control page to the baseline conversion rate. Add the minimum uplift you want to detect and measure as successful. Select relative difference to indicate the sample size for each test variation required. Stopping the test before reaching the ideal sample size offers a false-positive result. Ignore case studies that contain low sample sizes.
4. Test the required sample size before you start your experiment to collect enough data that represents your website's overall traffic.
Don’t: Send atypical traffic to the experiment, as it is not representative of your sample. Draw conclusions when data is insufficient.
5. Perform a second test on the same variation to check the validity of your results and prevent false positives.
To prevent a false positive that will decrease over time: Run the same tests again to confirm the uplift. Identify if the uplift is attributed to the novelty effect – a new landing page or new form.
6. Measure performance indicators that represent the final goal of your tests - not just micro-conversion indicators.
For example, if you are testing cart adds on a product page, an increase in add to cart events won’t automatically transpose to an increase in product purchases. Keep track of your business objectives during the test, and don’t focus on the number of engagements on a test. Whenever possible, measure for the revenue increase and not the number of conversions.
7. Run your tests in full weeks to understand the weekly cycle and how the test variations perform.
For example, if the tests started on Monday, you need to end the test the following Monday. Create a weekly report with day-to-day conversions to identify fluctuations.If there is no confidence achieved within a week, continue the test for another week without breaking the seven-day cycle.
8. Weed out low-performing variations by disabling the variations that perform worse for your other KPIs, and restart the test.
Do not stop a variation while the test is running; it will alter the test composition.
9. Send test data into Google Analytics to create a segment of your test results, and cross-check between testing platform data and GA data.
Use data from both platforms to identify if the data is accurate and if you recognize any reporting issues. Name tests individually in Google Analytics to make distinctions faster. Create advanced segments for each variation and create a new segment based on the event label.