Python Connection Draining — Core Concepts
What Is Connection Draining?
Connection draining (also called “graceful shutdown”) is the process of allowing in-flight requests to complete before shutting down a server instance. It’s essential for zero-downtime deployments, rolling updates, and scheduled maintenance.
Without draining, every deployment causes a brief period of errors for users whose requests are interrupted mid-processing.
The Shutdown Sequence
A properly draining server follows this sequence:
- Receive shutdown signal — typically SIGTERM from the orchestrator (Kubernetes, systemd, Docker)
- Stop accepting new connections — remove from load balancer, close listening socket
- Complete in-flight requests — let active request handlers finish
- Close idle connections — release keep-alive connections with no active requests
- Wait for grace period — allow up to N seconds for everything to finish
- Force shutdown — terminate any remaining connections after the grace period
Where Draining Happens
Load Balancer Level
The load balancer removes the instance from its pool. New requests go to other instances. Existing connections on the draining instance continue until complete.
AWS ALB, nginx, and HAProxy all support connection draining natively — you configure a “deregistration delay” (typically 30-300 seconds).
Application Level
The application catches the shutdown signal and stops accepting new work while finishing existing work. Python servers like Gunicorn, Uvicorn, and Hypercorn have built-in graceful shutdown support.
Container Orchestrator Level
Kubernetes sends SIGTERM, then waits for terminationGracePeriodSeconds (default 30s) before sending SIGKILL. Your app must finish draining within that window.
The Grace Period Dilemma
| Too Short | Too Long |
|---|---|
| Requests get killed mid-processing | Deployments take forever |
| Users see errors | Old instances linger |
| Data corruption risk for writes | Resources are wasted |
The sweet spot: Set the grace period to your P99 request duration plus a comfortable margin. If 99% of requests complete within 10 seconds, a 30-second grace period is generous. If you have long-running requests (file uploads, report generation), either increase the period or handle those requests separately.
Connection Draining vs. Graceful Shutdown
These terms are often used interchangeably, but there’s a nuance:
- Connection draining focuses on the network layer — letting TCP connections finish their current request/response cycles
- Graceful shutdown is broader — it includes draining connections, flushing buffers, closing database connections, saving state, and cleaning up resources
Connection draining is a component of graceful shutdown.
What About WebSockets and Long-Lived Connections?
WebSocket connections, server-sent events (SSE), and long-polling connections present a challenge — they can stay open for hours. Strategies:
- Send a “reconnect” frame before shutting down, so clients reconnect to a healthy instance
- Set a maximum connection lifetime that’s shorter than your deployment frequency
- Exclude long-lived connections from the drain timeout and close them after a warning
Common Misconception
“If I use a load balancer, I don’t need application-level draining.” The load balancer stops routing new requests, but the application still needs to handle SIGTERM properly. If your app receives SIGTERM and exits immediately, all in-flight requests die — regardless of what the load balancer does.
Both layers need to cooperate: the load balancer stops sending new traffic, and the application finishes existing work.
When Connection Draining Matters Most
- Rolling deployments — each instance shuts down in turn
- Autoscaling down — instances removed when demand drops
- Maintenance windows — planned restarts for OS updates
- Database migrations — draining before running schema changes
- Any write operation — incomplete writes can corrupt data
One thing to remember: Connection draining is what makes deployments invisible to users. Without it, every deploy is a mini-outage. Configure it at both the load balancer and application level, and always test it with realistic traffic.
See Also
- Python Aggregate Pattern Why grouping related objects under a single gatekeeper prevents data chaos in your Python application.
- Python Bounded Contexts Why the same word means different things in different parts of your code — and why that is perfectly fine.
- Python Bulkhead Pattern Why smart Python apps put walls between their parts — like a ship that stays afloat even with a hole in the hull.
- Python Circuit Breaker Pattern How a circuit breaker saves your app from crashing — explained with a home electrical fuse analogy.
- Python Clean Architecture Why your Python app should look like an onion — and how that saves you from painful rewrites.