A/B Testing — Explain Like I'm 5

The Two Flyers

A restaurant wants to know which of two flyers brings in more customers. Flyer A has a photo of food. Flyer B has a big discount coupon.

The naive approach: use flyer A this month, flyer B next month, and compare. But what if December is naturally busier than November? Or it snowed during the B month? You’d be comparing apples to oranges.

The right approach: randomly hand out flyer A to some households and flyer B to others — at the same time. Now both flyers face the same conditions: same weather, same time of year, same local events. Any difference in results is more likely to be caused by the flyer itself.

That’s A/B testing.

How Tech Companies Use It

Every major tech company — Google, Facebook, Netflix, Amazon — runs hundreds of A/B tests simultaneously. When you visit a website, you might unknowingly be in dozens of experiments:

  • 50% of users see a green “Buy” button, 50% see blue
  • 50% get the old recommendation algorithm, 50% get the new one
  • 50% see product reviews at the top of the page, 50% at the bottom

The company measures which version causes more people to buy, click, watch, or subscribe — and deploys the winner.

Netflix estimates it runs about 250 A/B tests per year. Google runs thousands. Amazon famously tests changes to its checkout flow constantly — their checkout process has been continuously refined through thousands of small experiments over two decades.

What Can Go Wrong

A/B testing sounds simple, but there are many ways to get wrong answers:

  • Not enough users: A small difference might just be random noise, not a real effect
  • Stopping early: If you stop the test the moment results look good, you’re likely fooling yourself
  • Testing too many things: If you test 20 random changes, one will “win” by pure chance

Scientists have developed statistical tools to handle these problems — calculating how many users you need, how long to run the test, and whether the difference is real or just noise.

One thing to remember: A/B testing is powerful because it separates cause from coincidence — by randomly assigning people to different experiences at the same time, you can reliably measure whether a change made things better or worse.

ab-testingstatisticsexperimentationdata-scienceproduct-analytics

See Also

  • Causal Inference Why correlation isn't causation — and the statistical methods scientists use to actually prove that one thing causes another without running a controlled experiment.
  • Time Series Forecasting How AI predicts the future from patterns in the past — the technology behind weather forecasts, stock predictions, electricity demand, and your iPhone's battery charge estimate.
  • Feature Engineering Why the way you describe your data to a machine learning model matters more than which model you choose — the art of turning raw data into something AI can actually learn from.