Python Async Performance Tuning — Core Concepts
The async performance paradox
Async Python can be extremely fast for I/O-heavy work — or surprisingly slow if misused. The event loop is single-threaded, so one mistake can bottleneck your entire application.
Problem 1: Unbounded concurrency
Launching thousands of coroutines simultaneously is easy but dangerous:
# BAD: 10,000 simultaneous connections
tasks = [fetch(url) for url in ten_thousand_urls]
results = await asyncio.gather(*tasks)
# Exhausts file descriptors, overwhelms servers, spikes memory
The fix is a semaphore that limits how many tasks run at once:
# GOOD: max 50 concurrent connections
sem = asyncio.Semaphore(50)
async def limited_fetch(url):
async with sem:
return await fetch(url)
tasks = [limited_fetch(url) for url in ten_thousand_urls]
results = await asyncio.gather(*tasks)
The right concurrency limit depends on your resource: 20-100 for HTTP calls to external APIs, 5-20 for database connections, hundreds for local network services.
Problem 2: Blocking the event loop
The event loop processes everything on a single thread. Any synchronous operation that takes more than a few milliseconds freezes all other tasks:
# BAD: blocks the event loop for seconds
async def process_file(path):
with open(path) as f: # blocking file I/O
data = f.read() # everything stalls
result = heavy_computation(data) # more blocking
return result
Common blockers: file I/O, DNS resolution, CPU-heavy computation, time.sleep(), synchronous database drivers.
The fix: offload blocking work to a thread or process pool:
import asyncio
async def process_file(path):
loop = asyncio.get_event_loop()
# File I/O → thread pool
data = await loop.run_in_executor(None, read_file, path)
# CPU work → process pool
result = await loop.run_in_executor(cpu_pool, heavy_computation, data)
return result
Problem 3: Sequential awaits that could be parallel
# SLOW: each await waits for the previous to finish
user = await get_user(user_id)
orders = await get_orders(user_id)
preferences = await get_preferences(user_id)
# Total: time_user + time_orders + time_preferences
# FAST: independent awaits run concurrently
user, orders, preferences = await asyncio.gather(
get_user(user_id),
get_orders(user_id),
get_preferences(user_id),
)
# Total: max(time_user, time_orders, time_preferences)
This single change often produces the biggest performance improvement. Look for consecutive await statements where the results don’t depend on each other.
Problem 4: Connection pool exhaustion
Every async HTTP or database call needs a connection. Without pooling, you create and destroy connections per request:
# BAD: new connection per request
async def fetch(url):
async with aiohttp.ClientSession() as session: # new pool each time
async with session.get(url) as resp:
return await resp.text()
# GOOD: shared session with connection pooling
session = aiohttp.ClientSession(
connector=aiohttp.TCPConnector(limit=100)
)
# Reuse session across requests, close at shutdown
Database pools have the same pattern. Create the pool once at startup, share it across handlers.
Measuring async performance
Standard profiling tools don’t work well with async code because they measure wall-clock time on the event loop thread, mixing I/O wait with actual work.
Use these instead:
asyncio.get_event_loop().slow_callback_duration— logs warnings when a callback blocks too long (default: 100ms)- aiomonitor — attach to a running async app and inspect task states
- py-spy — generates flame graphs that show where the event loop is stuck
Enable debug mode during development:
asyncio.run(main(), debug=True)
# Warns about: unawaited coroutines, slow callbacks, resource leaks
Common misconception: async makes everything faster
Async only helps when your program spends significant time waiting. A CPU-bound application gains nothing from async — it just adds complexity. If your profiling shows the event loop is busy (not waiting), async is the wrong tool.
The one thing to remember: async performance comes from three things — limiting concurrency with semaphores, never blocking the event loop, and running independent I/O operations in parallel with gather.
See Also
- Python Algorithmic Complexity Understand Algorithmic Complexity through a practical analogy so your Python decisions become faster and clearer.
- Python Benchmark Methodology Why timing Python code once means nothing, and how fair testing works like a science experiment.
- Python C Extension Performance How Python borrows C's speed for the hard parts — like hiring a specialist for the toughest job on the worksite.
- Python Caching Strategies Understand Python caching strategies with a shortcut-road analogy so your app gets faster without taking wrong turns.
- Python Caching Techniques Understand Caching Techniques through a practical analogy so your Python decisions become faster and clearer.