A/B Testing — Explain Like I'm 5
The Two Flyers
A restaurant wants to know which of two flyers brings in more customers. Flyer A has a photo of food. Flyer B has a big discount coupon.
The naive approach: use flyer A this month, flyer B next month, and compare. But what if December is naturally busier than November? Or it snowed during the B month? You’d be comparing apples to oranges.
The right approach: randomly hand out flyer A to some households and flyer B to others — at the same time. Now both flyers face the same conditions: same weather, same time of year, same local events. Any difference in results is more likely to be caused by the flyer itself.
That’s A/B testing.
How Tech Companies Use It
Every major tech company — Google, Facebook, Netflix, Amazon — runs hundreds of A/B tests simultaneously. When you visit a website, you might unknowingly be in dozens of experiments:
- 50% of users see a green “Buy” button, 50% see blue
- 50% get the old recommendation algorithm, 50% get the new one
- 50% see product reviews at the top of the page, 50% at the bottom
The company measures which version causes more people to buy, click, watch, or subscribe — and deploys the winner.
Netflix estimates it runs about 250 A/B tests per year. Google runs thousands. Amazon famously tests changes to its checkout flow constantly — their checkout process has been continuously refined through thousands of small experiments over two decades.
What Can Go Wrong
A/B testing sounds simple, but there are many ways to get wrong answers:
- Not enough users: A small difference might just be random noise, not a real effect
- Stopping early: If you stop the test the moment results look good, you’re likely fooling yourself
- Testing too many things: If you test 20 random changes, one will “win” by pure chance
Scientists have developed statistical tools to handle these problems — calculating how many users you need, how long to run the test, and whether the difference is real or just noise.
One thing to remember: A/B testing is powerful because it separates cause from coincidence — by randomly assigning people to different experiences at the same time, you can reliably measure whether a change made things better or worse.
See Also
- Causal Inference Why correlation isn't causation — and the statistical methods scientists use to actually prove that one thing causes another without running a controlled experiment.
- Time Series Forecasting How AI predicts the future from patterns in the past — the technology behind weather forecasts, stock predictions, electricity demand, and your iPhone's battery charge estimate.
- Feature Engineering Why the way you describe your data to a machine learning model matters more than which model you choose — the art of turning raw data into something AI can actually learn from.