Python WebSocket Scaling — Core Concepts
Why WebSockets Are Hard to Scale
HTTP requests are stateless: a client connects, gets a response, and disconnects. WebSockets are the opposite — a connection opens and stays open for minutes, hours, or days. Each open connection consumes memory and a file descriptor on the server. Scaling means managing thousands of these long-lived connections per server while coordinating messages across multiple servers.
Single-Server Foundations
Python’s asyncio ecosystem makes single-server scaling surprisingly effective. Libraries like websockets and FastAPI with Starlette use non-blocking I/O to handle thousands of connections on a single event loop.
Each WebSocket connection costs roughly 10–50 KB of memory depending on buffering. A server with 4 GB available for connections can theoretically hold 80,000–400,000 simultaneous WebSocket clients. The real bottleneck is usually message fan-out — broadcasting a message to 50,000 connected clients takes time proportional to the number of recipients.
Connection Lifecycle
A well-designed WebSocket server manages four phases:
- Handshake — the HTTP upgrade request. Authentication happens here (JWT tokens, cookies, or API keys).
- Active communication — bidirectional message passing.
- Health monitoring — ping/pong frames detect dead connections. Clients that miss pongs get disconnected.
- Graceful shutdown — on server restart, close connections with a close frame and let clients reconnect.
Multi-Server Architecture
When one server is not enough, you deploy multiple WebSocket servers behind a load balancer. The challenge: if user A connects to server 1 and user B connects to server 2, how does a message from A reach B?
The standard solution is a pub/sub backplane. Every server subscribes to a shared channel (typically Redis Pub/Sub or a message broker). When server 1 receives a message for a room, it publishes to the backplane. Server 2 picks it up and delivers to its local connections.
This pattern adds a small latency cost (typically 1–5ms for Redis on the same network) but makes horizontal scaling possible.
Load Balancing Considerations
Regular HTTP load balancers rotate requests across servers. For WebSockets, you need sticky sessions or connection-aware routing because the connection must stay on the same server for its entire lifetime.
Options:
- IP hash — simple but breaks when clients share IPs (corporate NAT).
- Cookie-based sticking — works but requires the initial HTTP handshake to set a routing cookie.
- Layer 4 (TCP) balancing — routes the entire TCP connection to one backend. Nginx and HAProxy both support this.
Memory and File Descriptor Budgets
Two system limits gate how many connections a single process handles:
- File descriptors — each WebSocket is a socket, which is a file descriptor. Linux defaults to 1,024 per process. Production servers set this to 65,536 or higher via
ulimit -n. - Memory — track per-connection memory including send/receive buffers. Set explicit buffer limits to prevent one slow client from consuming unbounded memory.
Common Misconception
Adding more WebSocket servers does not automatically increase the message throughput. If every message must be broadcast to every connected client, each server still processes every message via the backplane. The scaling gain is in connection capacity, not in reducing per-message work. True throughput scaling requires sharding — dividing clients into rooms or channels so each message only reaches relevant servers.
The one thing to remember: Scaling Python WebSockets is a two-layer problem — async handles thousands of connections per server, and a pub/sub backplane coordinates messages across servers.
See Also
- Python Aiohttp Server Build a web server in Python that handles thousands of visitors without breaking a sweat.
- Python Server Sent Events Patterns How Python servers push live updates to browsers using a one-way radio channel that is simpler than WebSockets.
- Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.
- Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
- Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.