Model Pruning — Explain Like I'm 5
The Garden That Got Too Big
Imagine you planted a garden with 10,000 seeds. Some grew into useful plants — herbs, vegetables, flowers. But many grew into scrubby weeds or just didn’t contribute much.
A skilled gardener would look at the garden and say: “80% of the harvest comes from 20% of these plants. Let me cut the rest — the garden will be easier to tend and the useful plants will get more sunlight.”
Neural networks have a similar problem. When you train them, they develop millions or even billions of “connections” (called weights). But after training, a lot of these connections are very small — close to zero — and aren’t contributing meaningfully to the model’s performance.
Pruning cuts those unnecessary connections.
What Happens When You Prune
A typical neural network for image recognition might have 25 million parameters. After careful pruning, you might be able to remove 80–90% of them and still get nearly the same accuracy.
Why? It turns out neural networks are massively over-parameterized. They need lots of parameters to learn effectively during training, but the learned knowledge gets compressed into a much smaller set of important connections. The rest are essentially noise.
Think of it like a large company that needs a big team to brainstorm and research — but once the strategy is decided, you only need a small focused team to execute it.
The Lottery Ticket Hypothesis
In 2019, researchers at MIT discovered something fascinating: inside every large neural network is a tiny “winning ticket” — a small subnetwork that, if you’d trained it in isolation from the start, would have learned just as well.
You can’t find this small network easily before training. You have to train the big network, then identify which connections matter, then remove the rest.
This suggests neural network pruning isn’t just a compression trick — it’s uncovering the efficient learning structure that was always inside.
One thing to remember: Pruning removes the connections in a neural network that don’t contribute to its performance — the same way a gardener removes plants that aren’t producing, leaving a smaller but equally productive system.
See Also
- Knowledge Distillation How AI companies shrink massive models down to phone-sized ones without losing much intelligence — the teacher-student trick that powers on-device AI.
- Model Quantization How AI models get shrunk to run on your phone — the precision-tradeoff trick that makes 70 billion parameter models fit in consumer hardware.
- Speculative Decoding The clever trick that makes large AI models generate text 2-4x faster — using a small 'draft' model to guess tokens that a big model then quickly verifies.
- Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.
- Ai Agents Architecture How AI systems go from answering questions to actually doing things — the design patterns that turn language models into autonomous agents that browse, code, and plan.