Python Multiprocessing — Core Concepts
Multiprocessing runs multiple Python processes so work can execute on multiple CPU cores at the same time. For CPU-bound workloads, it is often the most direct way to scale in CPython.
Why Multiprocessing Exists
CPython’s GIL limits concurrent execution of Python bytecode inside one process. With separate processes, each process has its own interpreter and GIL, enabling real parallelism.
Great fits:
- image/video transforms
- scientific simulations
- large numeric loops in pure Python
- batch feature engineering
Basic API: Process
from multiprocessing import Process
def worker(n):
print(n * n)
if __name__ == "__main__":
p = Process(target=worker, args=(8,))
p.start()
p.join()
start() launches a child process, join() waits for completion.
Process Pools
For many small/medium tasks, use a pool.
from multiprocessing import Pool
def heavy(x):
return x * x
if __name__ == "__main__":
with Pool(processes=4) as pool:
out = pool.map(heavy, range(10))
map distributes chunks across workers and returns results in input order.
Start Methods Matter
Python supports different child-process start strategies (platform-dependent):
spawn: fresh interpreter process (safe, explicit, slower startup)fork: child inherits parent memory snapshot (fast, Unix-specific caveats)forkserver: spawns from a server process
On modern cross-platform code, expect spawn behavior and design for picklable task functions and arguments.
Pickling Rules
Arguments and results typically cross process boundaries via serialization (pickle). That implies:
- task functions must be top-level importable functions
- lambdas / nested functions are often not picklable
- huge objects increase serialization overhead
If you pass a 500 MB dataframe to every task, IPC cost can erase compute gains.
Data Sharing Patterns
Use message passing first:
QueuePipe- pool return values
Shared state options exist (Value, Array, Manager) but add complexity and can become bottlenecks.
Chunk Size and Throughput
Pool methods batch tasks internally. Right chunk sizing reduces scheduling overhead.
pool.map(heavy, items, chunksize=100)
Small chunks improve load balancing for uneven task durations; larger chunks reduce overhead for uniform tasks.
Error Handling
Worker exceptions are propagated when fetching results. For async pool APIs (apply_async, imap_unordered), handle failures explicitly and log offending inputs for reproducibility.
Common Misconception
Misconception: multiprocessing always speeds everything up.
Reality: process startup + serialization + inter-process communication can dominate runtime for tiny tasks. It shines when each unit of work has enough CPU cost to amortize overhead.
Choosing Threads vs Processes
- I/O-bound workload: threads or async often better
- CPU-bound pure Python: multiprocessing usually best
- mixed workload: hybrid architecture can win
Related Topics
This topic complements Python Multithreading and Python Async/Await. Each solves concurrency differently; multiprocessing is the heavy-duty option for compute parallelism.
One Thing to Remember
Multiprocessing buys real parallel CPU execution, but you must budget for process and serialization overhead to get net speed improvements.
See Also
- Python Async Await Async/await helps one Python program juggle many waiting jobs at once, like a chef who keeps multiple pots moving without standing still.
- Python Basics Python is the programming language that reads like plain English — here's why millions of beginners (and experts) choose it first.
- Python Booleans Make Booleans click with one clear analogy you can reuse whenever Python feels confusing.
- Python Break Continue Make Break Continue click with one clear analogy you can reuse whenever Python feels confusing.
- Python Closures See how Python functions can remember private information, even after the outer function has already finished.