Python uvloop Performance — Deep Dive

Architecture Overview

uvloop sits between your asyncio code and the operating system:

Your async code

asyncio API (unchanged)

uvloop (Cython event loop implementation)

libuv (C library for async I/O)

OS kernel (epoll/kqueue/IOCP)

The key insight: asyncio was designed with a pluggable event loop. The AbstractEventLoop interface defines what an event loop must do (schedule callbacks, handle I/O, manage timers). uvloop provides a concrete implementation of this interface in Cython, delegating to libuv for the heavy lifting.

How libuv Works

libuv provides a single-threaded event loop with these components:

I/O Polling

On Linux, libuv uses epoll_wait(). On macOS, kqueue. On Windows, IOCP. These are the fastest kernel-level mechanisms for monitoring many file descriptors simultaneously.

The default asyncio loop uses Python’s selectors module, which wraps these same system calls but adds Python-level overhead for each event.

Timer Heap

libuv maintains timers in a min-heap data structure (in C). When asyncio uses call_later() or call_at(), uvloop delegates to libuv’s timer system, avoiding Python dict lookups and sorting.

Handle and Request Architecture

libuv models I/O as handles (long-lived objects like TCP servers, pipes, signals) and requests (short-lived operations like DNS lookups, file writes). This maps cleanly to asyncio’s transport/protocol architecture.

The Cython Layer

uvloop is written in Cython, which compiles to C extension modules. Key optimizations:

Reduced Python Object Creation

The default asyncio loop creates many temporary Python objects per iteration — tuples for callback arguments, wrappers for file descriptors. uvloop minimizes this by keeping hot data in C-level structures.

Direct C Function Calls

Where asyncio dispatches through Python method lookups (loop.call_soon, loop._process_events), uvloop’s Cython code calls C functions directly:

# Simplified uvloop internal
cdef class Loop:
    cdef _run_once(self):
        # Direct call to libuv, no Python method dispatch
        uv_run(self._loop, UV_RUN_ONCE)
        self._process_callbacks()

Inlined Callback Processing

uvloop inlines the most common callback patterns, reducing the per-callback overhead from ~microseconds (pure Python) to ~nanoseconds (C).

Profiling uvloop vs asyncio

Micro-benchmark: TCP Echo Server

import asyncio
import time

async def echo_handler(reader, writer):
    while data := await reader.read(1024):
        writer.write(data)
        await writer.drain()
    writer.close()

async def bench_server():
    server = await asyncio.start_server(echo_handler, '127.0.0.1', 8888)
    async with server:
        await server.serve_forever()

Results with wrk (10 threads, 1000 connections, 30 seconds):

LoopRequests/secLatency p99CPU Usage
asyncio~45,00012ms85%
uvloop~120,0004ms72%

The lower CPU usage at higher throughput demonstrates uvloop’s efficiency.

Profiling Tools

Use py-spy to profile uvloop applications:

py-spy record -o profile.svg --pid <PID>

With asyncio, you’ll see significant time in _selector.select() and _run_once. With uvloop, those paths are in C and show as [native] frames, making the Python-level profiling cleaner.

For libuv-level profiling, use perf:

perf record -g -p <PID> -- sleep 30
perf report

Production Deployment

With Uvicorn

pip install uvicorn[standard]  # Includes uvloop and httptools
uvicorn app:app --loop uvloop --http httptools

Uvicorn’s --loop uvloop flag sets the event loop policy before starting your ASGI app.

With Gunicorn + Uvicorn Workers

gunicorn app:app -w 4 -k uvicorn.workers.UvicornWorker

Each Gunicorn worker runs its own uvloop instance. Combined with multi-process architecture, this scales across CPU cores.

Programmatic Setup

import uvloop
import asyncio

def main():
    # Must be called before any event loop is created
    asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
    
    loop = asyncio.new_event_loop()
    try:
        loop.run_until_complete(app_main())
    finally:
        loop.close()

Docker Considerations

uvloop requires compilation during pip install. In Docker:

FROM python:3.12-slim

# Install build dependencies for uvloop
RUN apt-get update && apt-get install -y gcc libc-dev && \
    pip install uvloop && \
    apt-get purge -y gcc libc-dev && apt-get autoremove -y

# Or use a pre-compiled wheel
RUN pip install uvloop --only-binary=uvloop

Compatibility Gotchas

Signal Handling

uvloop handles signals through libuv, which has slightly different semantics:

# This works the same on both, but edge cases differ
loop.add_signal_handler(signal.SIGTERM, shutdown)

libuv signal handlers are global (per-process), while asyncio’s default loop uses signal.signal(). In multi-threaded applications, this can cause subtle differences.

Windows Unsupported

uvloop explicitly does not support Windows. If you need cross-platform compatibility:

import sys

if sys.platform != "win32":
    import uvloop
    asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
# On Windows, just use the default asyncio loop

Debugging Mode

asyncio’s debug mode (PYTHONASYNCIODEBUG=1) works with uvloop, but some debug features are less detailed since the loop internals are in C. The “slow callback” warnings still fire, but stack traces may be less informative.

Alternatives and Comparisons

Event LoopLanguageBackendStatus
asyncio (default)PythonselectorsStandard library
uvloopCython/ClibuvMature, widely used
tokio (via pyo3-asyncio)RusttokioExperimental
winloopCython/Clibuv (Windows)Early stage

When Not to Use uvloop

  1. Windows deployments — not supported
  2. PyPy users — incompatible (PyPy has its own event loop optimizations)
  3. Debugging async issues — the default loop has better introspection tools
  4. Libraries with custom loop requirements — some testing frameworks or special-purpose async libraries expect the default loop

Future: Python 3.12+ and Beyond

Python’s asyncio has gotten faster in recent versions. The performance gap between uvloop and the default loop has narrowed:

  • Python 3.11 added TaskGroup and optimized internal scheduling
  • Python 3.12 improved the selector-based loop performance
  • Future versions may adopt libuv or similar approaches into the standard library

uvloop remains faster, but the margin is shrinking. For many applications, the default loop in Python 3.12+ is “fast enough.”

One thing to remember: uvloop delivers its speed through compiled Cython code wrapping libuv’s C-level I/O primitives, eliminating Python-level overhead in the event loop’s hot path. It’s a production-proven optimization used by Uvicorn, Sanic, and EdgeDB — but profile your specific workload to verify the gains matter for your use case, and be aware of the platform and compatibility constraints.

pythonasyncuvloopperformance

See Also

  • Python Actor Model Why treating each piece of your program like a person with their own mailbox makes concurrency way less scary.
  • Python Aiocache Caching aiocache remembers expensive answers so your async Python app doesn't waste time asking the same question twice.
  • Python Aiofiles Async Io aiofiles lets your async Python program read and write files without freezing — because normal file operations secretly block everything.
  • Python Aiohttp Understand Aiohttp through an everyday analogy so Python behavior feels intuitive, not random.
  • Python Anyio Portability AnyIO lets your async Python code work with any async library — write once, run on asyncio or Trio without changes.