Python Load Shedding — Core Concepts

What Is Load Shedding?

Load shedding is the deliberate rejection of incoming requests when a system is at or near capacity. Instead of accepting more work than it can handle (which degrades performance for everyone), the system explicitly refuses excess requests and returns a fast error.

The term comes from electrical engineering — power grids shed load (cut power to some areas) to prevent a total blackout.

Load Shedding vs. Rate Limiting

These are often confused, but they solve different problems:

AspectRate LimitingLoad Shedding
PurposeEnforce per-client fairnessProtect system capacity
TriggerClient exceeds quotaSystem reaches capacity
ScopePer client/API keySystem-wide
Rejection429 Too Many Requests503 Service Unavailable
WhenAlways activeOnly under stress

Rate limiting says “you’re sending too many requests.” Load shedding says “we’re too busy for anyone right now.” You typically need both.

Shedding Strategies

1. LIFO (Last In, First Out) Shedding

Reject the newest requests and serve the ones that have already been waiting. Users who arrived first get served first. New arrivals during overload get fast rejections.

Best for: Interactive applications where users are still waiting for their request.

2. Priority-Based Shedding

Not all requests are equal. An authenticated user’s checkout request is more important than an anonymous user’s browse request. Assign priorities and shed low-priority traffic first.

Priority example:

  1. Health checks and admin endpoints (never shed)
  2. Payment and checkout (shed last)
  3. Authenticated user requests
  4. Anonymous user requests (shed first)
  5. Bot/crawler traffic (shed first)

3. Deadline-Based Shedding

If a request has been in the queue longer than its useful lifetime, drop it. A user who sent a request 30 seconds ago has probably refreshed or left. Processing their stale request wastes resources.

4. Probabilistic Shedding

As load increases, randomly reject a growing percentage of requests. At 80% capacity, reject 10%. At 90%, reject 30%. At 100%, reject 50%. This creates a smooth degradation curve instead of a sharp cliff.

Detecting Overload

How does your application know it’s overloaded?

  • Request queue depth — if the queue is longer than N, start shedding
  • Response latency — if P99 latency exceeds a threshold, the system is struggling
  • CPU/memory usage — approaching resource limits
  • Active connections — concurrent connections exceeding a safe threshold
  • Event loop lag (asyncio) — if the event loop is falling behind, I/O is backing up

The best systems combine multiple signals. A single metric can be misleading — high CPU might be from a batch job, not from overload.

The 503 Response

When shedding load, return HTTP 503 with useful headers:

HTTP/1.1 503 Service Unavailable
Retry-After: 5
X-Shed-Reason: capacity

Retry-After tells well-behaved clients when to try again. Without it, they’ll retry immediately and make the overload worse.

Common Misconception

“Load shedding means dropping requests silently.” Never do this. Always return an explicit rejection (503, or a queue-full error). Silent drops cause client-side timeouts — which are slow, confusing, and consume resources on both sides. A fast 503 is vastly better than a 30-second timeout.

When Load Shedding Is Critical

  • Viral traffic events — your product is on the front page of a news site
  • Black Friday / sales events — predictable but massive spikes
  • DDoS mitigation — shedding bot traffic to protect real users
  • Dependency slowdowns — when a slow database causes request queue growth
  • Cascade prevention — shedding early prevents your system from becoming the slow dependency for others

The Counter-Intuitive Truth

Adding more capacity doesn’t eliminate the need for load shedding. No matter how big your system is, there’s always a load level that exceeds it. Load shedding is the safety valve that defines behavior at the boundary — the plan for what happens when demand exceeds supply.

Google, Amazon, and Netflix all use load shedding in production. It’s not a sign of under-provisioning — it’s a sign of thoughtful design.

One thing to remember: Load shedding trades a small number of fast failures for system-wide stability. Rejecting 5% of requests with a quick 503 is vastly better than making 100% of requests slow and unreliable.

pythonreliabilityperformance

See Also