Python Multiprocessing Pool — Core Concepts
Why Pools Exist
Python’s Global Interpreter Lock (GIL) prevents true parallelism in threads for CPU-bound work. multiprocessing.Pool solves this by spawning separate Python processes, each with its own GIL. A pool manages a fixed number of worker processes and distributes tasks across them.
This is ideal when you have a function you want to apply to many inputs — image processing, numerical computation, data transformation — and each input can be processed independently.
Creating and Using a Pool
The basic pattern creates a pool, maps work to it, and shuts it down:
from multiprocessing import Pool
def square(n):
return n * n
with Pool(processes=4) as pool:
results = pool.map(square, range(1000))
# results = [0, 1, 4, 9, 16, ...]
Pool(processes=4) starts 4 worker processes. pool.map() splits the input list across workers, collects results, and returns them in order. The context manager (with) ensures proper cleanup.
Key Methods
map(func, iterable) — Ordered Results
Distributes items across workers and returns results in the same order as the input. Blocks until all items are processed.
imap(func, iterable) — Lazy Iterator
Returns results one by one as they complete, still in input order. Useful for processing results before all items are done:
with Pool(4) as pool:
for result in pool.imap(process_image, image_paths):
save_result(result)
imap_unordered(func, iterable) — Fastest First
Same as imap, but results come back in whatever order workers finish. Better throughput when order doesn’t matter.
starmap(func, iterable_of_tuples) — Multiple Arguments
When your function takes more than one argument:
def add(a, b):
return a + b
with Pool(4) as pool:
results = pool.starmap(add, [(1, 2), (3, 4), (5, 6)])
# results = [3, 7, 11]
apply_async(func, args) — Single Task, Non-Blocking
Submits one task and returns immediately with an AsyncResult:
with Pool(4) as pool:
future = pool.apply_async(slow_function, (arg1,))
# do other work...
result = future.get(timeout=30) # blocks until result is ready
How Many Processes?
A common starting point is os.cpu_count(), which returns the number of logical CPUs:
import os
from multiprocessing import Pool
with Pool(os.cpu_count()) as pool:
...
For CPU-bound tasks, matching the CPU count is usually optimal. For I/O-mixed work, you might use more. For memory-heavy tasks, you might use fewer to avoid exhausting RAM (each process gets its own memory space).
Data Transfer Costs
Every argument and return value crosses a process boundary via pickling — Python serializes the object, sends it through a pipe, and deserializes it on the other side.
This means:
- Large inputs/outputs slow things down (megabytes of data per call)
- Objects must be picklable (no lambdas, no database connections, no open files)
- Sending a 1GB DataFrame to each worker is wasteful — consider chunking or reading data within each worker
Initializers
If workers need setup (database connections, loading models), use initializer:
model = None
def init_worker():
global model
model = load_heavy_model()
def predict(data):
return model.predict(data)
with Pool(4, initializer=init_worker) as pool:
results = pool.map(predict, dataset)
The initializer runs once per worker process at startup, not once per task.
Common Misconception
“More processes always means faster.” Nope. Beyond your CPU count, extra processes just compete for the same cores and add overhead. And if your task is I/O-bound (network calls, disk reads), threads or asyncio are usually better choices than multiprocessing — processes have much higher startup and memory costs.
One thing to remember: multiprocessing.Pool is the simplest way to parallelize CPU-bound Python work across multiple cores — use map for batch processing and apply_async for individual tasks.
See Also
- Python Actor Model Why treating each piece of your program like a person with their own mailbox makes concurrency way less scary.
- Python Aiocache Caching aiocache remembers expensive answers so your async Python app doesn't waste time asking the same question twice.
- Python Aiofiles Async Io aiofiles lets your async Python program read and write files without freezing — because normal file operations secretly block everything.
- Python Aiohttp Understand Aiohttp through an everyday analogy so Python behavior feels intuitive, not random.
- Python Anyio Portability AnyIO lets your async Python code work with any async library — write once, run on asyncio or Trio without changes.