Python Memory-Mapped Files — Deep Dive
Virtual Memory Mechanics
Memory mapping works through the CPU’s virtual memory hardware. When mmap() is called at the OS level:
- The kernel creates a virtual memory area (VMA) in the process’s page table, marking a range of virtual addresses as backed by the file.
- These pages are initially marked as not present — no physical RAM is allocated.
- When the process accesses a mapped address, a page fault occurs.
- The kernel’s page fault handler reads the corresponding 4 KB page from disk into a physical frame and updates the page table.
- The CPU retries the instruction — this time it succeeds, transparently.
Subsequent accesses to the same page hit physical RAM directly. The OS tracks which pages are “hot” and evicts cold pages under memory pressure, using the LRU (or CLOCK) algorithm.
Python’s mmap Module Internals
Python’s mmap.mmap object wraps the OS-level mmap() system call and exposes it with Python buffer protocol support:
import mmap
with open("data.bin", "r+b") as f:
mm = mmap.mmap(
f.fileno(), # File descriptor
0, # Length (0 = entire file)
access=mmap.ACCESS_WRITE, # Read-write
offset=0 # Starting offset (must be page-aligned)
)
The offset parameter must be a multiple of mmap.ALLOCATIONGRANULARITY (typically 4096 on Linux, 65536 on Windows). This alignment requirement comes from the hardware page table.
Access Modes
| Mode | Description | OS flag |
|---|---|---|
ACCESS_READ | Read-only, writes raise TypeError | PROT_READ |
ACCESS_WRITE | Read-write, changes go to file | PROT_READ | PROT_WRITE, MAP_SHARED |
ACCESS_COPY | Read-write, changes are private (copy-on-write) | PROT_READ | PROT_WRITE, MAP_PRIVATE |
ACCESS_COPY is particularly useful for analysis: you can modify the mapped data (e.g., patching bytes for testing) without affecting the original file. The OS uses copy-on-write semantics — modified pages get their own physical memory while unmodified pages continue sharing the file’s pages.
NumPy Integration
NumPy arrays can be backed directly by memory-mapped files, combining vectorized operations with mmap efficiency:
import numpy as np
# Create a memory-mapped array
arr = np.memmap("features.dat", dtype=np.float32,
mode='r+', shape=(1_000_000, 128))
# Operate on slices without loading everything
batch = arr[5000:5100] # Only pages for rows 5000-5100 are loaded
norms = np.linalg.norm(batch, axis=1)
# Write results back
arr[5000:5100] /= norms[:, np.newaxis]
arr.flush()
This is how scikit-learn handles datasets larger than RAM in sklearn.datasets.load_svmlight_file with memory mapping. The pattern lets you train models on datasets that exceed available memory — the OS pages data in and out as the training algorithm accesses different portions.
Shared Memory IPC via mmap
Two processes can communicate through a shared memory-mapped file:
Writer Process
import mmap
import struct
import time
with open("/tmp/shared_data.bin", "r+b") as f:
mm = mmap.mmap(f.fileno(), 1024)
for i in range(1000):
# Write a counter and timestamp
data = struct.pack('Qd', i, time.time())
mm[:16] = data
mm.flush()
time.sleep(0.01)
mm.close()
Reader Process
import mmap
import struct
with open("/tmp/shared_data.bin", "r+b") as f:
mm = mmap.mmap(f.fileno(), 1024, access=mmap.ACCESS_READ)
last_counter = -1
while True:
data = mm[:16]
counter, timestamp = struct.unpack('Qd', data)
if counter != last_counter:
print(f"Counter: {counter}, Time: {timestamp}")
last_counter = counter
mm.close()
This approach is faster than sockets or pipes for large data transfers because no copying occurs — both processes access the same physical pages through their respective page tables.
For Python 3.8+, multiprocessing.shared_memory provides a higher-level API for anonymous (non-file-backed) shared memory. But file-backed mmap remains useful when you need persistence or cross-language compatibility.
Anonymous Memory Mapping
You can create memory-mapped regions without a backing file using mmap.mmap(-1, size):
import mmap
# Create 10 MB anonymous mapping
mm = mmap.mmap(-1, 10 * 1024 * 1024)
# Use as a fast, resizable byte buffer
mm[:4] = b'\x01\x02\x03\x04'
data = mm[:4]
mm.close()
Anonymous mappings are backed by swap space instead of a file. They’re useful for allocating large buffers that the OS can page out under memory pressure, unlike bytearray which is always in RAM (barring full system swap).
Page Cache Behavior and Tuning
Understanding the OS page cache is crucial for mmap performance:
import mmap
import os
# Advise the kernel about access patterns
with open("sequential_data.bin", "rb") as f:
mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
# Tell OS we'll read sequentially — enables readahead
mm.madvise(mmap.MADV_SEQUENTIAL)
# Or for random access — disables readahead
# mm.madvise(mmap.MADV_RANDOM)
# Process data...
mm.close()
madvise hints (available in Python 3.8+):
| Hint | Effect |
|---|---|
MADV_SEQUENTIAL | Aggressive readahead, free pages after reading |
MADV_RANDOM | Disable readahead, keep pages longer |
MADV_WILLNEED | Pre-fault specified pages (like prefetch) |
MADV_DONTNEED | Free specified pages immediately |
For sequential scans through large files, MADV_SEQUENTIAL can improve throughput by 2-3x because the kernel reads ahead while you process the current page.
Error Handling: SIGBUS
On Unix, if a memory-mapped file is truncated by another process while you’re reading it, accessing pages beyond the new end triggers a SIGBUS signal, which kills the process by default. Python cannot catch this with a try/except — it’s a signal, not an exception.
Defensive strategies:
import os
import mmap
import signal
def handle_bus_error(signum, frame):
raise IOError("Memory-mapped file was truncated")
signal.signal(signal.SIGBUS, handle_bus_error)
# Now SIGBUS raises an IOError instead of killing the process
Better yet: ensure exclusive access to the file using fcntl.flock() or use ACCESS_COPY mode which creates private copies of modified pages.
Production Pattern: Memory-Mapped Ring Buffer
A high-performance logging system can use mmap as a ring buffer:
import mmap
import struct
import os
class MmapRingBuffer:
HEADER_SIZE = 16 # write_pos (8 bytes) + count (8 bytes)
def __init__(self, path, capacity=1_000_000, record_size=256):
self.record_size = record_size
self.capacity = capacity
total_size = self.HEADER_SIZE + capacity * record_size
if not os.path.exists(path):
with open(path, 'wb') as f:
f.write(b'\x00' * total_size)
self._f = open(path, 'r+b')
self._mm = mmap.mmap(self._f.fileno(), total_size)
def write(self, data: bytes):
assert len(data) <= self.record_size
padded = data.ljust(self.record_size, b'\x00')
write_pos, count = struct.unpack('QQ', self._mm[:16])
offset = self.HEADER_SIZE + (write_pos % self.capacity) * self.record_size
self._mm[offset:offset + self.record_size] = padded
write_pos += 1
count = min(count + 1, self.capacity)
self._mm[:16] = struct.pack('QQ', write_pos, count)
def close(self):
self._mm.flush()
self._mm.close()
self._f.close()
This pattern provides:
- Crash recovery — Data persists on disk through the mapping.
- Zero-copy writes — Data goes directly to the page cache.
- Bounded memory — Fixed size ring buffer, OS manages page residency.
Benchmarks: mmap vs Alternatives
Reading 100,000 random 4 KB blocks from a 4 GB file:
| Method | Time | Peak RSS |
|---|---|---|
f.seek() + f.read() | 4.2s | 12 MB |
mmap random access | 1.8s | 180 MB (OS-managed) |
mmap + MADV_RANDOM | 1.6s | 120 MB |
Full f.read() then index | 14s (load) + 0.01s (access) | 4 GB |
mmap is fastest for random access because the OS page cache is optimized for exactly this pattern. The higher RSS reflects cached pages that the OS will reclaim under pressure.
The one thing to remember: Memory-mapped files leverage the OS’s virtual memory system for zero-copy file access with automatic page management — use madvise to match your access pattern, NumPy memmap for numerical data, and file-backed mappings for inter-process communication where copying overhead is unacceptable.
See Also
- Python Algorithmic Complexity Understand Algorithmic Complexity through a practical analogy so your Python decisions become faster and clearer.
- Python Async Performance Tuning Making your async Python faster is like organizing a busy restaurant kitchen — it's all about flow.
- Python Benchmark Methodology Why timing Python code once means nothing, and how fair testing works like a science experiment.
- Python C Extension Performance How Python borrows C's speed for the hard parts — like hiring a specialist for the toughest job on the worksite.
- Python Caching Strategies Understand Python caching strategies with a shortcut-road analogy so your app gets faster without taking wrong turns.