Python uvloop Performance — Deep Dive
Architecture Overview
uvloop sits between your asyncio code and the operating system:
Your async code
↓
asyncio API (unchanged)
↓
uvloop (Cython event loop implementation)
↓
libuv (C library for async I/O)
↓
OS kernel (epoll/kqueue/IOCP)
The key insight: asyncio was designed with a pluggable event loop. The AbstractEventLoop interface defines what an event loop must do (schedule callbacks, handle I/O, manage timers). uvloop provides a concrete implementation of this interface in Cython, delegating to libuv for the heavy lifting.
How libuv Works
libuv provides a single-threaded event loop with these components:
I/O Polling
On Linux, libuv uses epoll_wait(). On macOS, kqueue. On Windows, IOCP. These are the fastest kernel-level mechanisms for monitoring many file descriptors simultaneously.
The default asyncio loop uses Python’s selectors module, which wraps these same system calls but adds Python-level overhead for each event.
Timer Heap
libuv maintains timers in a min-heap data structure (in C). When asyncio uses call_later() or call_at(), uvloop delegates to libuv’s timer system, avoiding Python dict lookups and sorting.
Handle and Request Architecture
libuv models I/O as handles (long-lived objects like TCP servers, pipes, signals) and requests (short-lived operations like DNS lookups, file writes). This maps cleanly to asyncio’s transport/protocol architecture.
The Cython Layer
uvloop is written in Cython, which compiles to C extension modules. Key optimizations:
Reduced Python Object Creation
The default asyncio loop creates many temporary Python objects per iteration — tuples for callback arguments, wrappers for file descriptors. uvloop minimizes this by keeping hot data in C-level structures.
Direct C Function Calls
Where asyncio dispatches through Python method lookups (loop.call_soon, loop._process_events), uvloop’s Cython code calls C functions directly:
# Simplified uvloop internal
cdef class Loop:
cdef _run_once(self):
# Direct call to libuv, no Python method dispatch
uv_run(self._loop, UV_RUN_ONCE)
self._process_callbacks()
Inlined Callback Processing
uvloop inlines the most common callback patterns, reducing the per-callback overhead from ~microseconds (pure Python) to ~nanoseconds (C).
Profiling uvloop vs asyncio
Micro-benchmark: TCP Echo Server
import asyncio
import time
async def echo_handler(reader, writer):
while data := await reader.read(1024):
writer.write(data)
await writer.drain()
writer.close()
async def bench_server():
server = await asyncio.start_server(echo_handler, '127.0.0.1', 8888)
async with server:
await server.serve_forever()
Results with wrk (10 threads, 1000 connections, 30 seconds):
| Loop | Requests/sec | Latency p99 | CPU Usage |
|---|---|---|---|
| asyncio | ~45,000 | 12ms | 85% |
| uvloop | ~120,000 | 4ms | 72% |
The lower CPU usage at higher throughput demonstrates uvloop’s efficiency.
Profiling Tools
Use py-spy to profile uvloop applications:
py-spy record -o profile.svg --pid <PID>
With asyncio, you’ll see significant time in _selector.select() and _run_once. With uvloop, those paths are in C and show as [native] frames, making the Python-level profiling cleaner.
For libuv-level profiling, use perf:
perf record -g -p <PID> -- sleep 30
perf report
Production Deployment
With Uvicorn
pip install uvicorn[standard] # Includes uvloop and httptools
uvicorn app:app --loop uvloop --http httptools
Uvicorn’s --loop uvloop flag sets the event loop policy before starting your ASGI app.
With Gunicorn + Uvicorn Workers
gunicorn app:app -w 4 -k uvicorn.workers.UvicornWorker
Each Gunicorn worker runs its own uvloop instance. Combined with multi-process architecture, this scales across CPU cores.
Programmatic Setup
import uvloop
import asyncio
def main():
# Must be called before any event loop is created
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
loop = asyncio.new_event_loop()
try:
loop.run_until_complete(app_main())
finally:
loop.close()
Docker Considerations
uvloop requires compilation during pip install. In Docker:
FROM python:3.12-slim
# Install build dependencies for uvloop
RUN apt-get update && apt-get install -y gcc libc-dev && \
pip install uvloop && \
apt-get purge -y gcc libc-dev && apt-get autoremove -y
# Or use a pre-compiled wheel
RUN pip install uvloop --only-binary=uvloop
Compatibility Gotchas
Signal Handling
uvloop handles signals through libuv, which has slightly different semantics:
# This works the same on both, but edge cases differ
loop.add_signal_handler(signal.SIGTERM, shutdown)
libuv signal handlers are global (per-process), while asyncio’s default loop uses signal.signal(). In multi-threaded applications, this can cause subtle differences.
Windows Unsupported
uvloop explicitly does not support Windows. If you need cross-platform compatibility:
import sys
if sys.platform != "win32":
import uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
# On Windows, just use the default asyncio loop
Debugging Mode
asyncio’s debug mode (PYTHONASYNCIODEBUG=1) works with uvloop, but some debug features are less detailed since the loop internals are in C. The “slow callback” warnings still fire, but stack traces may be less informative.
Alternatives and Comparisons
| Event Loop | Language | Backend | Status |
|---|---|---|---|
| asyncio (default) | Python | selectors | Standard library |
| uvloop | Cython/C | libuv | Mature, widely used |
| tokio (via pyo3-asyncio) | Rust | tokio | Experimental |
| winloop | Cython/C | libuv (Windows) | Early stage |
When Not to Use uvloop
- Windows deployments — not supported
- PyPy users — incompatible (PyPy has its own event loop optimizations)
- Debugging async issues — the default loop has better introspection tools
- Libraries with custom loop requirements — some testing frameworks or special-purpose async libraries expect the default loop
Future: Python 3.12+ and Beyond
Python’s asyncio has gotten faster in recent versions. The performance gap between uvloop and the default loop has narrowed:
- Python 3.11 added
TaskGroupand optimized internal scheduling - Python 3.12 improved the selector-based loop performance
- Future versions may adopt libuv or similar approaches into the standard library
uvloop remains faster, but the margin is shrinking. For many applications, the default loop in Python 3.12+ is “fast enough.”
One thing to remember: uvloop delivers its speed through compiled Cython code wrapping libuv’s C-level I/O primitives, eliminating Python-level overhead in the event loop’s hot path. It’s a production-proven optimization used by Uvicorn, Sanic, and EdgeDB — but profile your specific workload to verify the gains matter for your use case, and be aware of the platform and compatibility constraints.
See Also
- Python Actor Model Why treating each piece of your program like a person with their own mailbox makes concurrency way less scary.
- Python Aiocache Caching aiocache remembers expensive answers so your async Python app doesn't waste time asking the same question twice.
- Python Aiofiles Async Io aiofiles lets your async Python program read and write files without freezing — because normal file operations secretly block everything.
- Python Aiohttp Understand Aiohttp through an everyday analogy so Python behavior feels intuitive, not random.
- Python Anyio Portability AnyIO lets your async Python code work with any async library — write once, run on asyncio or Trio without changes.