Python Thread Pool Sizing — Core Concepts

Why pool size matters

An under-sized pool leaves CPU cores idle while tasks queue up. An over-sized pool wastes memory (each thread costs ~8MB of stack space on Linux), increases context-switching overhead, and can cause resource exhaustion — file descriptor limits, database connection caps, or rate-limiting from APIs.

Getting this right is one of the highest-leverage performance decisions in concurrent Python.

The two categories

I/O-bound tasks spend most of their time waiting — HTTP requests, database queries, file reads. The CPU is idle during waits, so more threads than cores makes sense.

CPU-bound tasks keep the processor busy the entire time — data transformation, image processing, mathematical computation. Due to Python’s GIL, adding more threads for CPU-bound work doesn’t help (and can hurt). Use processes instead.

The classic formulas

For I/O-bound work (Brian Goetz’s formula):

Optimal threads = Number of cores × (1 + Wait time / Service time)

If your task spends 200ms waiting for a database and 10ms processing the result, and you have 4 cores:

4 × (1 + 200/10) = 4 × 21 = 84 threads

For CPU-bound work:

Optimal threads = Number of cores + 1

The extra thread keeps the CPU busy when one thread is briefly paused (garbage collection, page fault).

Python’s defaults

concurrent.futures.ThreadPoolExecutor defaults to min(32, os.cpu_count() + 4) since Python 3.8. This is a conservative general-purpose default. For specific workloads, you should override it.

ProcessPoolExecutor defaults to os.cpu_count() — one process per core, which is correct for CPU-bound work.

Practical guidelines

Workload typeStarting pool sizeAdjust based on
API calls (100-500ms latency)20-50 threadsTarget API rate limits
Database queries (5-50ms)10-20 threadsConnection pool max
File I/O4-8 threadsDisk throughput
CPU computationos.cpu_count() processesMemory per process
Mixed workloadSeparate pools for I/O and CPUProfile each type

The resource ceiling trap

Your thread pool doesn’t exist in isolation. Consider:

  • Database connections: PostgreSQL defaults to 100 max connections. A 200-thread pool hammering the database will hit connection errors.
  • File descriptors: Linux defaults to 1024. Each socket/file uses one. Monitor with ulimit -n.
  • Memory: 100 threads × 8MB stack = 800MB just for stacks, before any task data.
  • External API rate limits: 50 threads firing at an API with a 10 req/s limit means 40 threads are wasting time on retries.

Common misconception

“More threads always means faster.” Beyond the optimal point, adding threads actually slows things down. Context switching (the OS swapping between threads) costs real CPU time — typically 1-10 microseconds per switch. With thousands of threads, you can spend more time switching than working.

How to measure

  1. Start with the formula estimate
  2. Run your workload and measure throughput (tasks/second)
  3. Gradually increase pool size and re-measure
  4. Stop when throughput plateaus or decreases
  5. Check resource utilization (CPU, memory, connections) at each level

The one thing to remember: size your thread pool based on what your tasks actually do. I/O-heavy work benefits from many threads (10-100×), CPU-heavy work needs one thread per core. Start with the formula, then measure real throughput to fine-tune.

pythonadvancedconcurrency

See Also

  • Python Actor Model Why treating each piece of your program like a person with their own mailbox makes concurrency way less scary.
  • Python Aiocache Caching aiocache remembers expensive answers so your async Python app doesn't waste time asking the same question twice.
  • Python Aiofiles Async Io aiofiles lets your async Python program read and write files without freezing — because normal file operations secretly block everything.
  • Python Aiohttp Understand Aiohttp through an everyday analogy so Python behavior feels intuitive, not random.
  • Python Anyio Portability AnyIO lets your async Python code work with any async library — write once, run on asyncio or Trio without changes.