Python Async Performance Tuning — Core Concepts

The async performance paradox

Async Python can be extremely fast for I/O-heavy work — or surprisingly slow if misused. The event loop is single-threaded, so one mistake can bottleneck your entire application.

Problem 1: Unbounded concurrency

Launching thousands of coroutines simultaneously is easy but dangerous:

# BAD: 10,000 simultaneous connections
tasks = [fetch(url) for url in ten_thousand_urls]
results = await asyncio.gather(*tasks)
# Exhausts file descriptors, overwhelms servers, spikes memory

The fix is a semaphore that limits how many tasks run at once:

# GOOD: max 50 concurrent connections
sem = asyncio.Semaphore(50)

async def limited_fetch(url):
    async with sem:
        return await fetch(url)

tasks = [limited_fetch(url) for url in ten_thousand_urls]
results = await asyncio.gather(*tasks)

The right concurrency limit depends on your resource: 20-100 for HTTP calls to external APIs, 5-20 for database connections, hundreds for local network services.

Problem 2: Blocking the event loop

The event loop processes everything on a single thread. Any synchronous operation that takes more than a few milliseconds freezes all other tasks:

# BAD: blocks the event loop for seconds
async def process_file(path):
    with open(path) as f:          # blocking file I/O
        data = f.read()            # everything stalls
    result = heavy_computation(data)  # more blocking
    return result

Common blockers: file I/O, DNS resolution, CPU-heavy computation, time.sleep(), synchronous database drivers.

The fix: offload blocking work to a thread or process pool:

import asyncio

async def process_file(path):
    loop = asyncio.get_event_loop()

    # File I/O → thread pool
    data = await loop.run_in_executor(None, read_file, path)

    # CPU work → process pool
    result = await loop.run_in_executor(cpu_pool, heavy_computation, data)

    return result

Problem 3: Sequential awaits that could be parallel

# SLOW: each await waits for the previous to finish
user = await get_user(user_id)
orders = await get_orders(user_id)
preferences = await get_preferences(user_id)
# Total: time_user + time_orders + time_preferences

# FAST: independent awaits run concurrently
user, orders, preferences = await asyncio.gather(
    get_user(user_id),
    get_orders(user_id),
    get_preferences(user_id),
)
# Total: max(time_user, time_orders, time_preferences)

This single change often produces the biggest performance improvement. Look for consecutive await statements where the results don’t depend on each other.

Problem 4: Connection pool exhaustion

Every async HTTP or database call needs a connection. Without pooling, you create and destroy connections per request:

# BAD: new connection per request
async def fetch(url):
    async with aiohttp.ClientSession() as session:  # new pool each time
        async with session.get(url) as resp:
            return await resp.text()

# GOOD: shared session with connection pooling
session = aiohttp.ClientSession(
    connector=aiohttp.TCPConnector(limit=100)
)
# Reuse session across requests, close at shutdown

Database pools have the same pattern. Create the pool once at startup, share it across handlers.

Measuring async performance

Standard profiling tools don’t work well with async code because they measure wall-clock time on the event loop thread, mixing I/O wait with actual work.

Use these instead:

  • asyncio.get_event_loop().slow_callback_duration — logs warnings when a callback blocks too long (default: 100ms)
  • aiomonitor — attach to a running async app and inspect task states
  • py-spy — generates flame graphs that show where the event loop is stuck

Enable debug mode during development:

asyncio.run(main(), debug=True)
# Warns about: unawaited coroutines, slow callbacks, resource leaks

Common misconception: async makes everything faster

Async only helps when your program spends significant time waiting. A CPU-bound application gains nothing from async — it just adds complexity. If your profiling shows the event loop is busy (not waiting), async is the wrong tool.

The one thing to remember: async performance comes from three things — limiting concurrency with semaphores, never blocking the event loop, and running independent I/O operations in parallel with gather.

pythonasyncoptimization

See Also