Overview
A/B testing is a controlled experiment where two variants — A (control) and B (treatment) — are shown to randomly assigned user groups. Statistical analysis reveals which variant performs better with confidence.
When to Use
When you have sufficient traffic to achieve statistical significance, and want to optimise a specific metric in an existing flow.
How to Apply It
- Define the hypothesis: 'Changing X will improve metric Y by at least Z%'
- Define the primary metric and the minimum detectable effect
- Calculate required sample size for statistical significance
- Randomly assign users to Control (A) and Treatment (B)
- Run until required sample size is reached — never stop early
- Analyse: is the difference statistically significant? (p < 0.05) Then decide.
Examples in Practice
🎵 Spotify
Hypothesis: showing a context label under each recommended track will increase track completion rate by 10%. Control: recommendation without label. Treatment: recommendation with context label. Run for 2 weeks across 2M users. Result: +14% completion rate in Treatment, statistically significant. Ship globally.
📊 Trade Surveillance
Please contact the author for more information on these examples at linkedin.com/in/kshitijrege
Common Pitfalls
- Stopping the test early when results look promising — this produces false positives
- Running multiple tests simultaneously without controlling for interaction effects
- Not having enough traffic — an A/B test with 100 users tells you almost nothing
Origin
Statistics / direct mail tradition, popularised digitally by Google
1900s / 2000s
Further Reading
- Trustworthy Online Controlled Experiments — Kohavi et al.
Related Frameworks