Python Select and Polling — Core Concepts
The Problem: Watching Many Sockets
A chat server with 1,000 connected clients needs to know which sockets have data to read. Three naive approaches and why they fail:
- Blocking read on each socket — stuck waiting on Socket 1 while Socket 500 has data.
- One thread per socket — 1,000 threads consume memory and cause context-switching overhead.
- Non-blocking polling in a loop — wastes CPU spinning through sockets that have nothing.
I/O multiplexing solves this: ask the OS to watch all sockets at once and report which are ready.
The select Module
The oldest and most portable approach. select.select() takes three lists of file descriptors and returns which ones are ready:
import select
import socket
server = socket.socket()
server.bind(("0.0.0.0", 8080))
server.listen()
server.setblocking(False)
sockets = [server]
while True:
readable, writable, errors = select.select(sockets, [], [], 1.0)
for sock in readable:
if sock is server:
client, addr = server.accept()
client.setblocking(False)
sockets.append(client)
else:
data = sock.recv(4096)
if data:
process(data)
else:
sockets.remove(sock)
sock.close()
The three arguments are: sockets to watch for reading, writing, and errors. The fourth is a timeout in seconds. select.select() blocks until at least one socket is ready or the timeout expires.
Limitation: select scans all file descriptors linearly — O(n) per call. On some systems, it’s limited to 1,024 file descriptors (the FD_SETSIZE constant).
The poll Alternative
select.poll() removes the FD_SETSIZE limit and uses a cleaner registration API:
import select
poller = select.poll()
poller.register(server, select.POLLIN)
fd_to_socket = {server.fileno(): server}
while True:
events = poller.poll(1000) # timeout in milliseconds
for fd, event in events:
sock = fd_to_socket[fd]
if event & select.POLLIN:
if sock is server:
client, addr = server.accept()
client.setblocking(False)
poller.register(client, select.POLLIN)
fd_to_socket[client.fileno()] = client
else:
data = sock.recv(4096)
if not data:
poller.unregister(fd)
del fd_to_socket[fd]
sock.close()
poll is still O(n) per call (the kernel scans all registered FDs), but it handles more connections and has a cleaner interface than select.
The selectors Module (Recommended)
Python 3.4+ includes selectors, which automatically uses the best available mechanism for your OS:
import selectors
import socket
sel = selectors.DefaultSelector()
# DefaultSelector picks:
# - epoll on Linux
# - kqueue on macOS/BSD
# - select on Windows
server = socket.socket()
server.bind(("0.0.0.0", 8080))
server.listen()
server.setblocking(False)
def accept_connection(server_sock, mask):
client, addr = server_sock.accept()
client.setblocking(False)
sel.register(client, selectors.EVENT_READ, data=handle_client)
def handle_client(client_sock, mask):
data = client_sock.recv(4096)
if data:
client_sock.sendall(data) # echo
else:
sel.unregister(client_sock)
client_sock.close()
sel.register(server, selectors.EVENT_READ, data=accept_connection)
while True:
events = sel.select(timeout=1)
for key, mask in events:
callback = key.data
callback(key.fileobj, mask)
The data parameter on register() lets you attach a callback or any context to each socket. DefaultSelector abstracts away the differences between epoll, kqueue, and select.
Comparing the Mechanisms
| Mechanism | Complexity | Max FDs | Platform |
|---|---|---|---|
select | O(n) | ~1,024 | All |
poll | O(n) | Unlimited | Unix |
epoll | O(ready) | ~1,000,000 | Linux |
kqueue | O(ready) | ~100,000 | macOS/BSD |
The critical difference: epoll and kqueue are O(number of ready FDs), not O(total FDs). With 10,000 connections where 5 have data, select/poll scan all 10,000 while epoll reports just the 5.
How This Relates to asyncio
asyncio’s event loop uses selectors.DefaultSelector internally. When you write:
data = await reader.read(4096)
Under the hood, asyncio registered the socket with the selector. When you await, the coroutine suspends, and the event loop goes back to selector.select(). When the socket has data, the selector reports it, and the event loop resumes your coroutine.
Understanding select/poll is understanding what asyncio does for you automatically.
Common Misconception
“You need to choose between select, poll, and epoll.” Not anymore. Use selectors.DefaultSelector() and let Python pick the best option for your platform. You only need the low-level select module for legacy code or very specific requirements.
One thing to remember: I/O multiplexing lets one thread watch thousands of sockets efficiently by asking the OS “who’s ready?” — use the selectors module for the right abstraction, or let asyncio handle it entirely.
See Also
- Python Signal Handling How your Python program hears when the operating system taps it on the shoulder and says 'hey, stop' or 'hey, wake up.'
- Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.
- Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
- Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.
- Python 311 New Features Python 3.11 made everything faster, error messages smarter, and let you catch several mistakes at once instead of stopping at the first one.