Python Multiprocessing — Core Concepts

Learn when and how Python multiprocessing delivers true parallel CPU performance, plus safe data sharing patterns between worker processes.

Multiprocessing runs multiple Python processes so work can execute on multiple CPU cores at the same time. For CPU-bound workloads, it is often the most direct way to scale in CPython.

Why Multiprocessing Exists

CPython’s GIL limits concurrent execution of Python bytecode inside one process. With separate processes, each process has its own interpreter and GIL, enabling real parallelism.

Great fits:

image/video transforms
scientific simulations
large numeric loops in pure Python
batch feature engineering

Basic API: `Process`

from multiprocessing import Process


def worker(n):
    print(n * n)

if __name__ == "__main__":
    p = Process(target=worker, args=(8,))
    p.start()
    p.join()

start() launches a child process, join() waits for completion.

Process Pools

For many small/medium tasks, use a pool.

from multiprocessing import Pool


def heavy(x):
    return x * x

if __name__ == "__main__":
    with Pool(processes=4) as pool:
        out = pool.map(heavy, range(10))

map distributes chunks across workers and returns results in input order.

Start Methods Matter

Python supports different child-process start strategies (platform-dependent):

spawn: fresh interpreter process (safe, explicit, slower startup)
fork: child inherits parent memory snapshot (fast, Unix-specific caveats)
forkserver: spawns from a server process

On modern cross-platform code, expect spawn behavior and design for picklable task functions and arguments.

Pickling Rules

Arguments and results typically cross process boundaries via serialization (pickle). That implies:

task functions must be top-level importable functions
lambdas / nested functions are often not picklable
huge objects increase serialization overhead

If you pass a 500 MB dataframe to every task, IPC cost can erase compute gains.

Use message passing first:

Queue
Pipe
pool return values

Shared state options exist (Value, Array, Manager) but add complexity and can become bottlenecks.

Chunk Size and Throughput

Pool methods batch tasks internally. Right chunk sizing reduces scheduling overhead.

pool.map(heavy, items, chunksize=100)

Small chunks improve load balancing for uneven task durations; larger chunks reduce overhead for uniform tasks.

Error Handling

Worker exceptions are propagated when fetching results. For async pool APIs (apply_async, imap_unordered), handle failures explicitly and log offending inputs for reproducibility.

Common Misconception

Misconception: multiprocessing always speeds everything up.

Reality: process startup + serialization + inter-process communication can dominate runtime for tiny tasks. It shines when each unit of work has enough CPU cost to amortize overhead.

Choosing Threads vs Processes

I/O-bound workload: threads or async often better
CPU-bound pure Python: multiprocessing usually best
mixed workload: hybrid architecture can win

This topic complements Python Multithreading and Python Async/Await. Each solves concurrency differently; multiprocessing is the heavy-duty option for compute parallelism.

One Thing to Remember

Multiprocessing buys real parallel CPU execution, but you must budget for process and serialization overhead to get net speed improvements.

pythonmultiprocessingcpu-boundparallel-computing