Python Connection Draining — Core Concepts

Understand connection draining for zero-downtime Python deployments — how load balancers, signal handlers, and grace periods work together.

What Is Connection Draining?

Connection draining (also called “graceful shutdown”) is the process of allowing in-flight requests to complete before shutting down a server instance. It’s essential for zero-downtime deployments, rolling updates, and scheduled maintenance.

Without draining, every deployment causes a brief period of errors for users whose requests are interrupted mid-processing.

The Shutdown Sequence

A properly draining server follows this sequence:

Receive shutdown signal — typically SIGTERM from the orchestrator (Kubernetes, systemd, Docker)
Stop accepting new connections — remove from load balancer, close listening socket
Complete in-flight requests — let active request handlers finish
Close idle connections — release keep-alive connections with no active requests
Wait for grace period — allow up to N seconds for everything to finish
Force shutdown — terminate any remaining connections after the grace period

Where Draining Happens

Load Balancer Level

The load balancer removes the instance from its pool. New requests go to other instances. Existing connections on the draining instance continue until complete.

AWS ALB, nginx, and HAProxy all support connection draining natively — you configure a “deregistration delay” (typically 30-300 seconds).

Application Level

The application catches the shutdown signal and stops accepting new work while finishing existing work. Python servers like Gunicorn, Uvicorn, and Hypercorn have built-in graceful shutdown support.

Container Orchestrator Level

Kubernetes sends SIGTERM, then waits for terminationGracePeriodSeconds (default 30s) before sending SIGKILL. Your app must finish draining within that window.

The Grace Period Dilemma

Too Short	Too Long
Requests get killed mid-processing	Deployments take forever
Users see errors	Old instances linger
Data corruption risk for writes	Resources are wasted

The sweet spot: Set the grace period to your P99 request duration plus a comfortable margin. If 99% of requests complete within 10 seconds, a 30-second grace period is generous. If you have long-running requests (file uploads, report generation), either increase the period or handle those requests separately.

Connection Draining vs. Graceful Shutdown

These terms are often used interchangeably, but there’s a nuance:

Connection draining focuses on the network layer — letting TCP connections finish their current request/response cycles
Graceful shutdown is broader — it includes draining connections, flushing buffers, closing database connections, saving state, and cleaning up resources

Connection draining is a component of graceful shutdown.

What About WebSockets and Long-Lived Connections?

WebSocket connections, server-sent events (SSE), and long-polling connections present a challenge — they can stay open for hours. Strategies:

Send a “reconnect” frame before shutting down, so clients reconnect to a healthy instance
Set a maximum connection lifetime that’s shorter than your deployment frequency
Exclude long-lived connections from the drain timeout and close them after a warning

Common Misconception

“If I use a load balancer, I don’t need application-level draining.” The load balancer stops routing new requests, but the application still needs to handle SIGTERM properly. If your app receives SIGTERM and exits immediately, all in-flight requests die — regardless of what the load balancer does.

Both layers need to cooperate: the load balancer stops sending new traffic, and the application finishes existing work.

When Connection Draining Matters Most

Rolling deployments — each instance shuts down in turn
Autoscaling down — instances removed when demand drops
Maintenance windows — planned restarts for OS updates
Database migrations — draining before running schema changes
Any write operation — incomplete writes can corrupt data

One thing to remember: Connection draining is what makes deployments invisible to users. Without it, every deploy is a mini-outage. Configure it at both the load balancer and application level, and always test it with realistic traffic.

pythonreliabilitydeployment