Python Load Shedding — ELI5

Imagine a lifeboat that can safely hold 20 people. There are 50 people in the water. If all 50 try to climb in, the boat sinks and nobody is saved. But if you take 20 people and tell the others “wait for the next boat,” those 20 are safe.

It sounds harsh, but saving 20 is better than losing everyone.

Load shedding is the same idea for your Python app.

Your app can handle, say, 1,000 requests per second. Normally it gets about 500 — no problem. But one day something happens: a viral tweet, a sale event, a bot attack. Suddenly 5,000 requests per second slam in.

If the app tries to handle all 5,000, everything slows down. Pages take 30 seconds to load. Timeouts pile up. The database chokes. Eventually the whole thing crashes and nobody gets served.

Load shedding says: “I can handle 1,000 requests. I’ll accept the first 1,000 and politely tell the rest: ‘Sorry, we’re full right now, please try again in a moment.’” Those 1,000 users get fast, normal responses. The rejected users get a quick “try again” message instead of waiting forever.

It’s like a restaurant with no empty tables. They don’t let people stand in the kitchen — they tell you the wait time and let you decide. Meanwhile, the seated diners still get great service.

The key insight: it’s better to serve some users well than to serve all users badly.

One thing to remember: Load shedding means deliberately rejecting some requests so the rest get served properly. It’s not a failure — it’s a strategy to prevent total failure.

pythonreliabilityperformance

See Also