Python Load Shedding — Core Concepts

Understand when and how to reject excess traffic in Python applications — with priority-based shedding, adaptive thresholds, and the difference from rate limiting.

What Is Load Shedding?

Load shedding is the deliberate rejection of incoming requests when a system is at or near capacity. Instead of accepting more work than it can handle (which degrades performance for everyone), the system explicitly refuses excess requests and returns a fast error.

The term comes from electrical engineering — power grids shed load (cut power to some areas) to prevent a total blackout.

Load Shedding vs. Rate Limiting

These are often confused, but they solve different problems:

Aspect	Rate Limiting	Load Shedding
Purpose	Enforce per-client fairness	Protect system capacity
Trigger	Client exceeds quota	System reaches capacity
Scope	Per client/API key	System-wide
Rejection	429 Too Many Requests	503 Service Unavailable
When	Always active	Only under stress

Rate limiting says “you’re sending too many requests.” Load shedding says “we’re too busy for anyone right now.” You typically need both.

Shedding Strategies

1. LIFO (Last In, First Out) Shedding

Reject the newest requests and serve the ones that have already been waiting. Users who arrived first get served first. New arrivals during overload get fast rejections.

Best for: Interactive applications where users are still waiting for their request.

2. Priority-Based Shedding

Not all requests are equal. An authenticated user’s checkout request is more important than an anonymous user’s browse request. Assign priorities and shed low-priority traffic first.

Priority example:

Health checks and admin endpoints (never shed)
Payment and checkout (shed last)
Authenticated user requests
Anonymous user requests (shed first)
Bot/crawler traffic (shed first)

3. Deadline-Based Shedding

If a request has been in the queue longer than its useful lifetime, drop it. A user who sent a request 30 seconds ago has probably refreshed or left. Processing their stale request wastes resources.

4. Probabilistic Shedding

As load increases, randomly reject a growing percentage of requests. At 80% capacity, reject 10%. At 90%, reject 30%. At 100%, reject 50%. This creates a smooth degradation curve instead of a sharp cliff.

Detecting Overload

How does your application know it’s overloaded?

Request queue depth — if the queue is longer than N, start shedding
Response latency — if P99 latency exceeds a threshold, the system is struggling
CPU/memory usage — approaching resource limits
Active connections — concurrent connections exceeding a safe threshold
Event loop lag (asyncio) — if the event loop is falling behind, I/O is backing up

The best systems combine multiple signals. A single metric can be misleading — high CPU might be from a batch job, not from overload.

The 503 Response

When shedding load, return HTTP 503 with useful headers:

HTTP/1.1 503 Service Unavailable
Retry-After: 5
X-Shed-Reason: capacity

Retry-After tells well-behaved clients when to try again. Without it, they’ll retry immediately and make the overload worse.

Common Misconception

“Load shedding means dropping requests silently.” Never do this. Always return an explicit rejection (503, or a queue-full error). Silent drops cause client-side timeouts — which are slow, confusing, and consume resources on both sides. A fast 503 is vastly better than a 30-second timeout.

When Load Shedding Is Critical

Viral traffic events — your product is on the front page of a news site
Black Friday / sales events — predictable but massive spikes
DDoS mitigation — shedding bot traffic to protect real users
Dependency slowdowns — when a slow database causes request queue growth
Cascade prevention — shedding early prevents your system from becoming the slow dependency for others

The Counter-Intuitive Truth

Adding more capacity doesn’t eliminate the need for load shedding. No matter how big your system is, there’s always a load level that exceeds it. Load shedding is the safety valve that defines behavior at the boundary — the plan for what happens when demand exceeds supply.

Google, Amazon, and Netflix all use load shedding in production. It’s not a sign of under-provisioning — it’s a sign of thoughtful design.

One thing to remember: Load shedding trades a small number of fast failures for system-wide stability. Rejecting 5% of requests with a quick 503 is vastly better than making 100% of requests slow and unreliable.

pythonreliabilityperformance