Python Connection Draining — Core Concepts

What Is Connection Draining?

Connection draining (also called “graceful shutdown”) is the process of allowing in-flight requests to complete before shutting down a server instance. It’s essential for zero-downtime deployments, rolling updates, and scheduled maintenance.

Without draining, every deployment causes a brief period of errors for users whose requests are interrupted mid-processing.

The Shutdown Sequence

A properly draining server follows this sequence:

  1. Receive shutdown signal — typically SIGTERM from the orchestrator (Kubernetes, systemd, Docker)
  2. Stop accepting new connections — remove from load balancer, close listening socket
  3. Complete in-flight requests — let active request handlers finish
  4. Close idle connections — release keep-alive connections with no active requests
  5. Wait for grace period — allow up to N seconds for everything to finish
  6. Force shutdown — terminate any remaining connections after the grace period

Where Draining Happens

Load Balancer Level

The load balancer removes the instance from its pool. New requests go to other instances. Existing connections on the draining instance continue until complete.

AWS ALB, nginx, and HAProxy all support connection draining natively — you configure a “deregistration delay” (typically 30-300 seconds).

Application Level

The application catches the shutdown signal and stops accepting new work while finishing existing work. Python servers like Gunicorn, Uvicorn, and Hypercorn have built-in graceful shutdown support.

Container Orchestrator Level

Kubernetes sends SIGTERM, then waits for terminationGracePeriodSeconds (default 30s) before sending SIGKILL. Your app must finish draining within that window.

The Grace Period Dilemma

Too ShortToo Long
Requests get killed mid-processingDeployments take forever
Users see errorsOld instances linger
Data corruption risk for writesResources are wasted

The sweet spot: Set the grace period to your P99 request duration plus a comfortable margin. If 99% of requests complete within 10 seconds, a 30-second grace period is generous. If you have long-running requests (file uploads, report generation), either increase the period or handle those requests separately.

Connection Draining vs. Graceful Shutdown

These terms are often used interchangeably, but there’s a nuance:

  • Connection draining focuses on the network layer — letting TCP connections finish their current request/response cycles
  • Graceful shutdown is broader — it includes draining connections, flushing buffers, closing database connections, saving state, and cleaning up resources

Connection draining is a component of graceful shutdown.

What About WebSockets and Long-Lived Connections?

WebSocket connections, server-sent events (SSE), and long-polling connections present a challenge — they can stay open for hours. Strategies:

  • Send a “reconnect” frame before shutting down, so clients reconnect to a healthy instance
  • Set a maximum connection lifetime that’s shorter than your deployment frequency
  • Exclude long-lived connections from the drain timeout and close them after a warning

Common Misconception

“If I use a load balancer, I don’t need application-level draining.” The load balancer stops routing new requests, but the application still needs to handle SIGTERM properly. If your app receives SIGTERM and exits immediately, all in-flight requests die — regardless of what the load balancer does.

Both layers need to cooperate: the load balancer stops sending new traffic, and the application finishes existing work.

When Connection Draining Matters Most

  • Rolling deployments — each instance shuts down in turn
  • Autoscaling down — instances removed when demand drops
  • Maintenance windows — planned restarts for OS updates
  • Database migrations — draining before running schema changes
  • Any write operation — incomplete writes can corrupt data

One thing to remember: Connection draining is what makes deployments invisible to users. Without it, every deploy is a mini-outage. Configure it at both the load balancer and application level, and always test it with realistic traffic.

pythonreliabilitydeployment

See Also