Python Memory Layout Optimization — Core Concepts

Why Python objects are expensive

Every Python object carries overhead beyond its actual data:

  • Reference count — 8 bytes for garbage collection tracking
  • Type pointer — 8 bytes pointing to the class
  • Dictionary — ~100+ bytes for the __dict__ attribute dictionary (regular classes)

A single Python int uses 28 bytes. A C int uses 4 bytes. When you have millions of objects, this overhead dominates.

Strategy 1: slots for fixed-attribute classes

By default, Python stores instance attributes in a dictionary. __slots__ replaces this with a fixed-size array:

# Regular class: ~152 bytes per instance
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

# Slotted class: ~56 bytes per instance
class Point:
    __slots__ = ('x', 'y')
    def __init__(self, x, y):
        self.x = x
        self.y = y

That’s nearly 3× less memory per instance. For a million points, that’s ~96MB saved.

The tradeoff: you can’t add arbitrary attributes at runtime, and __slots__ doesn’t inherit automatically to subclasses (you must redeclare it).

Strategy 2: arrays instead of lists for homogeneous data

A Python list stores pointers to objects. The array module stores raw values:

import array
import sys

# List of million ints: ~8MB (pointers) + 28MB (int objects) ≈ 36MB
numbers_list = list(range(1_000_000))

# Array of million ints: ~4MB (raw 4-byte values)
numbers_array = array.array('i', range(1_000_000))

print(sys.getsizeof(numbers_list))   # ~8,448,728
print(sys.getsizeof(numbers_array))  # ~4,000,064

Arrays store values packed contiguously, making them cache-friendly and memory-efficient. The limitation: all elements must be the same type.

Strategy 3: NumPy for numerical data

NumPy takes the array concept further with multi-dimensional arrays and vectorized operations:

import numpy as np

# 1 million 64-bit floats: ~8MB
data = np.zeros(1_000_000, dtype=np.float64)

# Choose smaller types when possible
# 1 million 32-bit floats: ~4MB
data = np.zeros(1_000_000, dtype=np.float32)

# 1 million 8-bit integers: ~1MB
labels = np.zeros(1_000_000, dtype=np.uint8)

Besides saving memory, contiguous layout enables CPU vector instructions (SIMD) that process 4-8 values simultaneously.

Strategy 4: dataclasses with slots

Python 3.10+ supports slots=True in dataclasses:

from dataclasses import dataclass

@dataclass(slots=True)
class Particle:
    x: float
    y: float
    z: float
    mass: float

This combines the ergonomics of dataclasses with the memory efficiency of __slots__.

Strategy 5: struct packing for binary data

When storing or transmitting many fixed-format records:

import struct

# Define a format: 2 floats + 1 unsigned int
record_format = struct.Struct('ffI')  # 12 bytes per record

# Pack 1 million records: 12MB total
buffer = bytearray(record_format.size * 1_000_000)
for i in range(1_000_000):
    record_format.pack_into(buffer, i * record_format.size, 1.0, 2.0, i)

Cache friendliness

Modern CPUs load memory in 64-byte cache lines. When you access one element, the next 63 bytes come free. This means:

  • Iterating over a NumPy array — each cache line gives you 8 float64 values. Sequential access is fast.
  • Iterating over a Python list of objects — each cache line gives you 8 pointers, but the actual objects are scattered elsewhere. Every access potentially triggers a cache miss.

This “pointer chasing” penalty can make Python lists 10-50× slower than NumPy arrays for numerical work, even ignoring interpreter overhead.

Common misconception: memory optimization is premature

For application code with dozens or hundreds of objects, memory layout is irrelevant. But data-intensive applications — data processing, scientific computing, game engines, ML pipelines — routinely create millions of objects. At that scale, choosing the right container is not premature optimization; it’s basic engineering.

Quick decision guide

Data typeBest container
Millions of numbersNumPy array
Millions of records with fixed fields@dataclass(slots=True) or namedtuple
Binary protocol datastruct module
Homogeneous typed valuesarray.array
Small collections, mixed typesRegular Python list/dict (overhead doesn’t matter)

The one thing to remember: Python’s default objects carry 50-100+ bytes of overhead each — for large datasets, switch to contiguous containers like NumPy arrays or slotted classes to cut memory usage and improve cache performance.

pythonperformancememory

See Also