Causal Inference — Explain Like I'm 5
The Ice Cream Problem
Ice cream sales and drowning deaths both increase in summer. Strong correlation. Does this mean ice cream causes drowning? Should we ban ice cream to save lives?
Obviously not. Both are caused by a third factor: hot weather. More heat → more swimming AND more ice cream. But ice cream and drowning have no causal relationship.
This is the fundamental problem of correlation vs. causation. And it matters enormously when you’re trying to understand the world.
Why Scientists Don’t Just Run Experiments
The gold standard for proving causation is a randomized controlled trial (RCT) — randomly split people into two groups, give treatment to one, compare outcomes. That’s how drugs get approved.
But often you can’t randomize:
- You can’t randomly assign people to smoke cigarettes for 30 years to study lung cancer
- You can’t randomly send students to different quality schools to study education outcomes
- You can’t randomly give some cities a minimum wage increase and not others
So researchers developed clever ways to find the “hidden experiment” in naturally occurring data.
The Clever Tricks
Natural experiments: Sometimes life naturally creates randomization-like situations. The Vietnam War draft used a lottery — random birthday numbers determined who got drafted. This let researchers study the causal effect of military service on lifetime earnings.
Difference-in-differences: You study two cities. One raises its minimum wage; one doesn’t. You compare how employment changed in both cities before and after the wage increase. The city with no change is your “control group.”
Instrumental variables: Find a third variable that affects the treatment but doesn’t directly affect the outcome. Use it to “tease out” the causal effect from the correlation.
These methods power a huge amount of social science, economics, and medical research where randomized experiments aren’t possible.
One thing to remember: Causal inference is about asking “what would have happened if X had been different?” — and the tools are cleverly designed to answer that counterfactual question from observational data.
See Also
- Ab Testing How tech companies run thousands of experiments at once to improve their products — the scientific method applied to everything from button colors to recommendation algorithms.
- Time Series Forecasting How AI predicts the future from patterns in the past — the technology behind weather forecasts, stock predictions, electricity demand, and your iPhone's battery charge estimate.
- Feature Engineering Why the way you describe your data to a machine learning model matters more than which model you choose — the art of turning raw data into something AI can actually learn from.