Python Multiprocessing Pool — Core Concepts

Using multiprocessing.Pool to parallelize CPU-bound work across multiple processes with map, starmap, and apply_async.

Why Pools Exist

Python’s Global Interpreter Lock (GIL) prevents true parallelism in threads for CPU-bound work. multiprocessing.Pool solves this by spawning separate Python processes, each with its own GIL. A pool manages a fixed number of worker processes and distributes tasks across them.

This is ideal when you have a function you want to apply to many inputs — image processing, numerical computation, data transformation — and each input can be processed independently.

Creating and Using a Pool

The basic pattern creates a pool, maps work to it, and shuts it down:

from multiprocessing import Pool

def square(n):
    return n * n

with Pool(processes=4) as pool:
    results = pool.map(square, range(1000))
# results = [0, 1, 4, 9, 16, ...]

Pool(processes=4) starts 4 worker processes. pool.map() splits the input list across workers, collects results, and returns them in order. The context manager (with) ensures proper cleanup.

Key Methods

`map(func, iterable)` — Ordered Results

Distributes items across workers and returns results in the same order as the input. Blocks until all items are processed.

`imap(func, iterable)` — Lazy Iterator

Returns results one by one as they complete, still in input order. Useful for processing results before all items are done:

with Pool(4) as pool:
    for result in pool.imap(process_image, image_paths):
        save_result(result)

`imap_unordered(func, iterable)` — Fastest First

Same as imap, but results come back in whatever order workers finish. Better throughput when order doesn’t matter.

`starmap(func, iterable_of_tuples)` — Multiple Arguments

When your function takes more than one argument:

def add(a, b):
    return a + b

with Pool(4) as pool:
    results = pool.starmap(add, [(1, 2), (3, 4), (5, 6)])
# results = [3, 7, 11]

`apply_async(func, args)` — Single Task, Non-Blocking

Submits one task and returns immediately with an AsyncResult:

with Pool(4) as pool:
    future = pool.apply_async(slow_function, (arg1,))
    # do other work...
    result = future.get(timeout=30)  # blocks until result is ready

How Many Processes?

A common starting point is os.cpu_count(), which returns the number of logical CPUs:

import os
from multiprocessing import Pool

with Pool(os.cpu_count()) as pool:
    ...

For CPU-bound tasks, matching the CPU count is usually optimal. For I/O-mixed work, you might use more. For memory-heavy tasks, you might use fewer to avoid exhausting RAM (each process gets its own memory space).

Data Transfer Costs

Every argument and return value crosses a process boundary via pickling — Python serializes the object, sends it through a pipe, and deserializes it on the other side.

This means:

Large inputs/outputs slow things down (megabytes of data per call)
Objects must be picklable (no lambdas, no database connections, no open files)
Sending a 1GB DataFrame to each worker is wasteful — consider chunking or reading data within each worker

Initializers

If workers need setup (database connections, loading models), use initializer:

model = None

def init_worker():
    global model
    model = load_heavy_model()

def predict(data):
    return model.predict(data)

with Pool(4, initializer=init_worker) as pool:
    results = pool.map(predict, dataset)

The initializer runs once per worker process at startup, not once per task.

Common Misconception

“More processes always means faster.” Nope. Beyond your CPU count, extra processes just compete for the same cores and add overhead. And if your task is I/O-bound (network calls, disk reads), threads or asyncio are usually better choices than multiprocessing — processes have much higher startup and memory costs.

One thing to remember: multiprocessing.Pool is the simplest way to parallelize CPU-bound Python work across multiple cores — use map for batch processing and apply_async for individual tasks.

pythonconcurrencymultiprocessing

Python Multiprocessing Pool — Core Concepts

Why Pools Exist

Creating and Using a Pool

Key Methods

map(func, iterable) — Ordered Results

imap(func, iterable) — Lazy Iterator

imap_unordered(func, iterable) — Fastest First

starmap(func, iterable_of_tuples) — Multiple Arguments

apply_async(func, args) — Single Task, Non-Blocking