Python Shared Memory Multiprocessing — Core Concepts

Using multiprocessing.shared_memory to pass large data between Python processes without copying, with practical patterns for NumPy arrays and structured data.

The Problem: Inter-Process Data Transfer

Python’s multiprocessing module spawns separate OS processes to bypass the GIL and achieve true parallelism. But separate processes mean separate memory spaces. When you pass data between processes using Queue, Pipe, or function arguments, Python serializes (pickles) the data, copies it through the kernel, and deserializes it on the other side.

For a 1 GB NumPy array, this means:

~2 seconds to pickle
~1 second to copy through a pipe
~2 seconds to unpickle
2 GB of peak memory (original + copy)

That’s 5 seconds of overhead before any actual computation happens.

The Solution: multiprocessing.shared_memory

Python 3.8 introduced multiprocessing.shared_memory, which creates a block of memory accessible by multiple processes without copying:

from multiprocessing import shared_memory
import numpy as np

# Parent process: create shared memory and put data in it
data = np.random.randn(1_000_000).astype(np.float64)

shm = shared_memory.SharedMemory(create=True, size=data.nbytes)
shared_array = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
shared_array[:] = data  # Copy data into shared memory (one-time cost)

print(f"Shared memory name: {shm.name}")
# Pass shm.name to child processes

# Child process: attach to existing shared memory
from multiprocessing import shared_memory
import numpy as np

shm = shared_memory.SharedMemory(name="the_name_from_parent")
shared_array = np.ndarray((1_000_000,), dtype=np.float64, buffer=shm.buf)

# Read/write directly — no copy, no deserialization
result = shared_array.mean()

The data exists once in memory. All processes reference the same physical pages.

How It Works

Under the hood, SharedMemory uses POSIX shared memory on Linux/macOS (shm_open) and named file mappings on Windows (CreateFileMapping). The OS creates a memory region that exists independently of any single process. Each process that attaches to it maps the region into its own virtual address space.

Key characteristics:

Named — Each shared memory block has a name (auto-generated or specified). Processes connect by name.
Persistent — The block persists until explicitly destroyed, even if the creating process exits.
Unstructured — It’s raw bytes. You impose structure (like NumPy array layout) yourself.

SharedMemory Lifecycle

from multiprocessing import shared_memory

# Create
shm = shared_memory.SharedMemory(create=True, size=1024, name="my_data")

# Attach from another process
shm2 = shared_memory.SharedMemory(name="my_data")

# Use via shm.buf (a memoryview object)

# When done in each process: close the local mapping
shm2.close()

# When done everywhere: destroy the block (call from ONE process)
shm.close()
shm.unlink()  # Removes the shared memory block from the OS

Forgetting to call unlink() leaves orphaned shared memory blocks. On Linux, you can find them in /dev/shm/:

ls /dev/shm/
# Shows shared memory blocks; delete orphans manually if needed

SharedMemoryManager for Automatic Cleanup

For more robust lifecycle management, use SharedMemoryManager:

from multiprocessing.managers import SharedMemoryManager

with SharedMemoryManager() as smm:
    shm = smm.SharedMemory(size=1024)
    # Use shm...
# Automatically cleaned up when context exits

The manager handles cleanup even if child processes crash.

Practical Pattern: Parallel NumPy Processing

import numpy as np
from multiprocessing import shared_memory, Process

def worker(shm_name, shape, dtype, start, end):
    shm = shared_memory.SharedMemory(name=shm_name)
    arr = np.ndarray(shape, dtype=dtype, buffer=shm.buf)

    # Process a slice in-place
    arr[start:end] = np.sqrt(np.abs(arr[start:end]))

    shm.close()

# Setup
data = np.random.randn(10_000_000).astype(np.float64)
shm = shared_memory.SharedMemory(create=True, size=data.nbytes)
shared = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
shared[:] = data

# Launch workers
n_workers = 4
chunk_size = len(data) // n_workers
processes = []
for i in range(n_workers):
    start = i * chunk_size
    end = start + chunk_size if i < n_workers - 1 else len(data)
    p = Process(target=worker,
                args=(shm.name, data.shape, data.dtype, start, end))
    p.start()
    processes.append(p)

for p in processes:
    p.join()

# shared array now contains results from all workers
print(shared[:5])

shm.close()
shm.unlink()

Each worker processes its chunk in-place on the shared array. No data is copied between processes.

When to Use Shared Memory vs Other IPC

Method	Best For	Overhead
`Queue` / `Pipe`	Small messages, task distribution	Pickle + kernel copy
`shared_memory`	Large arrays, parallel computation	One-time setup only
`Manager` (proxy objects)	Shared dicts/lists with synchronization	Very high (per-access RPC)
Memory-mapped file	Persistence + sharing	File I/O overhead

Use shared memory when data is large (>1 MB) and multiple processes need to read or write it.

Common Misconception

Shared memory doesn’t provide synchronization. Two processes writing to the same bytes simultaneously causes data corruption — just like threads writing to shared variables. You need multiprocessing.Lock or multiprocessing.Barrier to coordinate writes. Reads are safe as long as no process is writing to the same region simultaneously.

The one thing to remember: multiprocessing.shared_memory eliminates the serialization and copying overhead of inter-process data transfer by letting all processes access the same physical memory — but you must manage the lifecycle (create, close, unlink) and synchronize concurrent writes yourself.

pythonperformancemultiprocessingconcurrency