Python Memory Layout Optimization — Core Concepts
Why Python objects are expensive
Every Python object carries overhead beyond its actual data:
- Reference count — 8 bytes for garbage collection tracking
- Type pointer — 8 bytes pointing to the class
- Dictionary — ~100+ bytes for the
__dict__attribute dictionary (regular classes)
A single Python int uses 28 bytes. A C int uses 4 bytes. When you have millions of objects, this overhead dominates.
Strategy 1: slots for fixed-attribute classes
By default, Python stores instance attributes in a dictionary. __slots__ replaces this with a fixed-size array:
# Regular class: ~152 bytes per instance
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
# Slotted class: ~56 bytes per instance
class Point:
__slots__ = ('x', 'y')
def __init__(self, x, y):
self.x = x
self.y = y
That’s nearly 3× less memory per instance. For a million points, that’s ~96MB saved.
The tradeoff: you can’t add arbitrary attributes at runtime, and __slots__ doesn’t inherit automatically to subclasses (you must redeclare it).
Strategy 2: arrays instead of lists for homogeneous data
A Python list stores pointers to objects. The array module stores raw values:
import array
import sys
# List of million ints: ~8MB (pointers) + 28MB (int objects) ≈ 36MB
numbers_list = list(range(1_000_000))
# Array of million ints: ~4MB (raw 4-byte values)
numbers_array = array.array('i', range(1_000_000))
print(sys.getsizeof(numbers_list)) # ~8,448,728
print(sys.getsizeof(numbers_array)) # ~4,000,064
Arrays store values packed contiguously, making them cache-friendly and memory-efficient. The limitation: all elements must be the same type.
Strategy 3: NumPy for numerical data
NumPy takes the array concept further with multi-dimensional arrays and vectorized operations:
import numpy as np
# 1 million 64-bit floats: ~8MB
data = np.zeros(1_000_000, dtype=np.float64)
# Choose smaller types when possible
# 1 million 32-bit floats: ~4MB
data = np.zeros(1_000_000, dtype=np.float32)
# 1 million 8-bit integers: ~1MB
labels = np.zeros(1_000_000, dtype=np.uint8)
Besides saving memory, contiguous layout enables CPU vector instructions (SIMD) that process 4-8 values simultaneously.
Strategy 4: dataclasses with slots
Python 3.10+ supports slots=True in dataclasses:
from dataclasses import dataclass
@dataclass(slots=True)
class Particle:
x: float
y: float
z: float
mass: float
This combines the ergonomics of dataclasses with the memory efficiency of __slots__.
Strategy 5: struct packing for binary data
When storing or transmitting many fixed-format records:
import struct
# Define a format: 2 floats + 1 unsigned int
record_format = struct.Struct('ffI') # 12 bytes per record
# Pack 1 million records: 12MB total
buffer = bytearray(record_format.size * 1_000_000)
for i in range(1_000_000):
record_format.pack_into(buffer, i * record_format.size, 1.0, 2.0, i)
Cache friendliness
Modern CPUs load memory in 64-byte cache lines. When you access one element, the next 63 bytes come free. This means:
- Iterating over a NumPy array — each cache line gives you 8 float64 values. Sequential access is fast.
- Iterating over a Python list of objects — each cache line gives you 8 pointers, but the actual objects are scattered elsewhere. Every access potentially triggers a cache miss.
This “pointer chasing” penalty can make Python lists 10-50× slower than NumPy arrays for numerical work, even ignoring interpreter overhead.
Common misconception: memory optimization is premature
For application code with dozens or hundreds of objects, memory layout is irrelevant. But data-intensive applications — data processing, scientific computing, game engines, ML pipelines — routinely create millions of objects. At that scale, choosing the right container is not premature optimization; it’s basic engineering.
Quick decision guide
| Data type | Best container |
|---|---|
| Millions of numbers | NumPy array |
| Millions of records with fixed fields | @dataclass(slots=True) or namedtuple |
| Binary protocol data | struct module |
| Homogeneous typed values | array.array |
| Small collections, mixed types | Regular Python list/dict (overhead doesn’t matter) |
The one thing to remember: Python’s default objects carry 50-100+ bytes of overhead each — for large datasets, switch to contiguous containers like NumPy arrays or slotted classes to cut memory usage and improve cache performance.
See Also
- Python Algorithmic Complexity Understand Algorithmic Complexity through a practical analogy so your Python decisions become faster and clearer.
- Python Async Performance Tuning Making your async Python faster is like organizing a busy restaurant kitchen — it's all about flow.
- Python Benchmark Methodology Why timing Python code once means nothing, and how fair testing works like a science experiment.
- Python C Extension Performance How Python borrows C's speed for the hard parts — like hiring a specialist for the toughest job on the worksite.
- Python Caching Strategies Understand Python caching strategies with a shortcut-road analogy so your app gets faster without taking wrong turns.