Python Shared Memory Multiprocessing — Core Concepts
The Problem: Inter-Process Data Transfer
Python’s multiprocessing module spawns separate OS processes to bypass the GIL and achieve true parallelism. But separate processes mean separate memory spaces. When you pass data between processes using Queue, Pipe, or function arguments, Python serializes (pickles) the data, copies it through the kernel, and deserializes it on the other side.
For a 1 GB NumPy array, this means:
- ~2 seconds to pickle
- ~1 second to copy through a pipe
- ~2 seconds to unpickle
- 2 GB of peak memory (original + copy)
That’s 5 seconds of overhead before any actual computation happens.
The Solution: multiprocessing.shared_memory
Python 3.8 introduced multiprocessing.shared_memory, which creates a block of memory accessible by multiple processes without copying:
from multiprocessing import shared_memory
import numpy as np
# Parent process: create shared memory and put data in it
data = np.random.randn(1_000_000).astype(np.float64)
shm = shared_memory.SharedMemory(create=True, size=data.nbytes)
shared_array = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
shared_array[:] = data # Copy data into shared memory (one-time cost)
print(f"Shared memory name: {shm.name}")
# Pass shm.name to child processes
# Child process: attach to existing shared memory
from multiprocessing import shared_memory
import numpy as np
shm = shared_memory.SharedMemory(name="the_name_from_parent")
shared_array = np.ndarray((1_000_000,), dtype=np.float64, buffer=shm.buf)
# Read/write directly — no copy, no deserialization
result = shared_array.mean()
The data exists once in memory. All processes reference the same physical pages.
How It Works
Under the hood, SharedMemory uses POSIX shared memory on Linux/macOS (shm_open) and named file mappings on Windows (CreateFileMapping). The OS creates a memory region that exists independently of any single process. Each process that attaches to it maps the region into its own virtual address space.
Key characteristics:
- Named — Each shared memory block has a name (auto-generated or specified). Processes connect by name.
- Persistent — The block persists until explicitly destroyed, even if the creating process exits.
- Unstructured — It’s raw bytes. You impose structure (like NumPy array layout) yourself.
SharedMemory Lifecycle
from multiprocessing import shared_memory
# Create
shm = shared_memory.SharedMemory(create=True, size=1024, name="my_data")
# Attach from another process
shm2 = shared_memory.SharedMemory(name="my_data")
# Use via shm.buf (a memoryview object)
# When done in each process: close the local mapping
shm2.close()
# When done everywhere: destroy the block (call from ONE process)
shm.close()
shm.unlink() # Removes the shared memory block from the OS
Forgetting to call unlink() leaves orphaned shared memory blocks. On Linux, you can find them in /dev/shm/:
ls /dev/shm/
# Shows shared memory blocks; delete orphans manually if needed
SharedMemoryManager for Automatic Cleanup
For more robust lifecycle management, use SharedMemoryManager:
from multiprocessing.managers import SharedMemoryManager
with SharedMemoryManager() as smm:
shm = smm.SharedMemory(size=1024)
# Use shm...
# Automatically cleaned up when context exits
The manager handles cleanup even if child processes crash.
Practical Pattern: Parallel NumPy Processing
import numpy as np
from multiprocessing import shared_memory, Process
def worker(shm_name, shape, dtype, start, end):
shm = shared_memory.SharedMemory(name=shm_name)
arr = np.ndarray(shape, dtype=dtype, buffer=shm.buf)
# Process a slice in-place
arr[start:end] = np.sqrt(np.abs(arr[start:end]))
shm.close()
# Setup
data = np.random.randn(10_000_000).astype(np.float64)
shm = shared_memory.SharedMemory(create=True, size=data.nbytes)
shared = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
shared[:] = data
# Launch workers
n_workers = 4
chunk_size = len(data) // n_workers
processes = []
for i in range(n_workers):
start = i * chunk_size
end = start + chunk_size if i < n_workers - 1 else len(data)
p = Process(target=worker,
args=(shm.name, data.shape, data.dtype, start, end))
p.start()
processes.append(p)
for p in processes:
p.join()
# shared array now contains results from all workers
print(shared[:5])
shm.close()
shm.unlink()
Each worker processes its chunk in-place on the shared array. No data is copied between processes.
When to Use Shared Memory vs Other IPC
| Method | Best For | Overhead |
|---|---|---|
Queue / Pipe | Small messages, task distribution | Pickle + kernel copy |
shared_memory | Large arrays, parallel computation | One-time setup only |
Manager (proxy objects) | Shared dicts/lists with synchronization | Very high (per-access RPC) |
| Memory-mapped file | Persistence + sharing | File I/O overhead |
Use shared memory when data is large (>1 MB) and multiple processes need to read or write it.
Common Misconception
Shared memory doesn’t provide synchronization. Two processes writing to the same bytes simultaneously causes data corruption — just like threads writing to shared variables. You need multiprocessing.Lock or multiprocessing.Barrier to coordinate writes. Reads are safe as long as no process is writing to the same region simultaneously.
The one thing to remember: multiprocessing.shared_memory eliminates the serialization and copying overhead of inter-process data transfer by letting all processes access the same physical memory — but you must manage the lifecycle (create, close, unlink) and synchronize concurrent writes yourself.
See Also
- Python Algorithmic Complexity Understand Algorithmic Complexity through a practical analogy so your Python decisions become faster and clearer.
- Python Async Performance Tuning Making your async Python faster is like organizing a busy restaurant kitchen — it's all about flow.
- Python Benchmark Methodology Why timing Python code once means nothing, and how fair testing works like a science experiment.
- Python C Extension Performance How Python borrows C's speed for the hard parts — like hiring a specialist for the toughest job on the worksite.
- Python Caching Strategies Understand Python caching strategies with a shortcut-road analogy so your app gets faster without taking wrong turns.