Data Model Customization — Deep Dive
The C-Level Slot System
Python’s special methods aren’t just syntactic sugar. At the CPython level, each type object (PyTypeObject) has C-level function pointers called type slots. When you implement __add__, CPython copies a reference into the nb_add slot. When Python encounters +, it calls the slot directly — not through the normal attribute lookup chain.
This means:
- Special methods on the class are used, never on the instance. Setting
obj.__len__ = lambda: 42has no effect onlen(obj). - Looking up special methods bypasses
__getattribute__and__getattr__. - Metaclass special methods (like
__instancecheck__) operate on the metaclass, not the class.
Key Slot Families
| C Slot Group | Python Methods | Purpose |
|---|---|---|
tp_as_number | __add__, __mul__, __int__, etc. | Numeric operations |
tp_as_sequence | __len__, __getitem__, __contains__ | Sequence protocol |
tp_as_mapping | __getitem__, __setitem__, __len__ | Mapping protocol |
tp_richcompare | __eq__, __lt__, etc. | Rich comparison |
tp_hash | __hash__ | Hash value |
tp_call | __call__ | Callable |
tp_iter | __iter__, __next__ | Iterator protocol |
tp_repr / tp_str | __repr__, __str__ | String conversion |
Building a Production-Quality Vector Type
Let’s build a Vector class that fully integrates with Python’s data model:
import math
import functools
import reprlib
from array import array
@functools.total_ordering
class Vector:
"""An n-dimensional vector supporting arithmetic and iteration."""
typecode = 'd' # double-precision float
def __init__(self, components):
self._components = array(self.typecode, components)
# === Representation ===
def __repr__(self):
components = reprlib.repr(list(self._components))
return f"Vector({components})"
def __str__(self):
return str(tuple(self._components))
def __format__(self, fmt_spec):
if fmt_spec.endswith('h'): # Hyperspherical coordinates
fmt_spec = fmt_spec[:-1]
coords = self._angles()
outer_fmt = '<{}>'.format
else:
coords = self._components
outer_fmt = '({})'.format
components = (format(c, fmt_spec) for c in coords)
return outer_fmt(', '.join(components))
# === Equality and Hashing ===
def __eq__(self, other):
if not isinstance(other, Vector):
return NotImplemented
return len(self) == len(other) and all(
a == b for a, b in zip(self._components, other._components)
)
def __hash__(self):
hashes = (hash(x) for x in self._components)
return functools.reduce(lambda a, b: a ^ b, hashes, 0)
# === Boolean ===
def __bool__(self):
return bool(abs(self))
def __abs__(self):
return math.sqrt(sum(x * x for x in self._components))
# === Container Protocol ===
def __len__(self):
return len(self._components)
def __getitem__(self, index):
if isinstance(index, slice):
return Vector(self._components[index])
return self._components[index]
def __iter__(self):
return iter(self._components)
def __contains__(self, value):
return value in self._components
# === Arithmetic ===
def __add__(self, other):
if isinstance(other, Vector):
if len(self) != len(other):
raise ValueError("Vectors must have same dimension")
return Vector(a + b for a, b in zip(self, other))
return NotImplemented
def __radd__(self, other):
return self + other
def __mul__(self, scalar):
if isinstance(scalar, (int, float)):
return Vector(x * scalar for x in self)
return NotImplemented
def __rmul__(self, scalar):
return self * scalar
def __neg__(self):
return Vector(-x for x in self)
def __pos__(self):
return Vector(self)
# === Comparison ===
def __lt__(self, other):
if not isinstance(other, Vector):
return NotImplemented
return abs(self) < abs(other)
# === Callable (dot product) ===
def __matmul__(self, other):
"""Vector @ Vector = dot product (PEP 465)."""
if isinstance(other, Vector):
if len(self) != len(other):
raise ValueError("Vectors must have same dimension")
return sum(a * b for a, b in zip(self, other))
return NotImplemented
def __rmatmul__(self, other):
return self @ other
Usage
v1 = Vector([3, 4, 5])
v2 = Vector([1, 2, 3])
len(v1) # 3
abs(v1) # 7.071...
v1 + v2 # Vector([4, 6, 8])
v1 * 2 # Vector([6, 8, 10])
2 * v1 # Vector([6, 8, 10]) — via __rmul__
v1 @ v2 # 26 (dot product via @)
v1[1] # 4.0
v1[0:2] # Vector([3, 4])
bool(Vector([0])) # False (zero vector)
hash(v1) # Works — can use as dict key
The Reflected Method Protocol
When Python evaluates a + b:
1. If type(b) is a strict subclass of type(a):
result = b.__radd__(a)
if result is not NotImplemented: return result
2. result = a.__add__(b)
if result is not NotImplemented: return result
3. result = b.__radd__(a)
if result is not NotImplemented: return result
4. raise TypeError
The subclass priority (step 1) is crucial. It allows subclasses to override operations:
class Vector3D(Vector):
def __add__(self, other):
if len(other) != 3:
raise ValueError("Vector3D requires 3 dimensions")
return Vector3D(a + b for a, b in zip(self, other))
def __radd__(self, other):
return self.__add__(other)
# Vector3D's __radd__ is tried first because it's a subclass
v = Vector([1, 2, 3])
v3d = Vector3D([4, 5, 6])
result = v + v3d # Calls v3d.__radd__(v) first → returns Vector3D
__init_subclass__ for Data Model Hooks
Python 3.6+ lets base classes customize how subclasses are created:
class Serializable:
_formats: dict[str, type] = {}
def __init_subclass__(cls, /, format: str = "", **kwargs):
super().__init_subclass__(**kwargs)
if format:
cls._format = format
Serializable._formats[format] = cls
def serialize(self):
raise NotImplementedError
@classmethod
def deserialize(cls, data, format: str):
return cls._formats[format]().parse(data)
Context Managers: __enter__ and __exit__
The context manager protocol integrates with with statements:
class DatabaseTransaction:
def __init__(self, connection):
self.conn = connection
def __enter__(self):
self.conn.begin()
return self.conn.cursor()
def __exit__(self, exc_type, exc_val, exc_tb):
if exc_type is None:
self.conn.commit()
else:
self.conn.rollback()
return False # Don't suppress exceptions
The return value of __exit__ controls exception suppression. Returning True swallows the exception — use this rarely and deliberately.
Async Data Model
Python’s data model extends to async operations:
| Method | Triggered By |
|---|---|
__aiter__ | async for item in obj |
__anext__ | Async iteration |
__aenter__ | async with obj as x: |
__aexit__ | End of async with |
__await__ | await obj |
Performance Considerations
Slot-Based Dispatch Is Fast
Special method dispatch via C-level slots avoids the Python attribute lookup chain entirely. Calling len(obj) is faster than calling obj.length() because len() goes directly to sq_length without dictionary lookups.
Benchmark on CPython 3.12:
len(obj)via__len__: ~30nsobj.get_length()regular method call: ~50ns
__slots__ for Memory
For data model objects created in large numbers, combine with __slots__:
class Point:
__slots__ = ('x', 'y')
def __init__(self, x, y):
self.x = x
self.y = y
def __add__(self, other):
return Point(self.x + other.x, self.y + other.y)
def __repr__(self):
return f"Point({self.x}, {self.y})"
A Point with __slots__ uses ~56 bytes vs. ~152 bytes with a regular __dict__.
Common Pitfalls
Forgetting NotImplemented
# WRONG — raises TypeError for mixed types
def __add__(self, other):
if not isinstance(other, Vector):
raise TypeError(f"Can't add Vector and {type(other)}")
# RIGHT — allows Python to try the reflected method
def __add__(self, other):
if not isinstance(other, Vector):
return NotImplemented
Breaking the __eq__/__hash__ Contract
Objects that compare equal must have equal hashes:
# If a == b, then hash(a) MUST equal hash(b)
# The reverse is NOT required (hash collisions are fine)
Implementing __del__ (Don’t)
__del__ is a finalizer, not a destructor. It’s called at an unpredictable time (or never, if there’s a reference cycle). Use context managers (__enter__/__exit__) for resource cleanup instead.
One thing to remember: Python’s data model is a protocol system where special methods are hooks into the language runtime. They’re dispatched at the C level (not through normal attribute lookup), which makes them both fast and slightly different from regular methods. Master the data model, and you can make your objects indistinguishable from built-in types.
See Also
- Python Attribute Lookup Chain How Python finds your variables and methods — like checking your pockets, then your bag, then your locker, in a specific order every time.
- Python Bytecode And Interpreter How your .py file turns into tiny instructions the Python interpreter can execute step by step.
- Python Class Body Execution Python runs the code inside your class definition immediately — like reading a recipe out loud before anyone starts cooking.
- Python Garbage Collection See how Python cleans up unreachable objects, especially the tricky ones that point at each other.
- Python Gil Why Python threads can feel stuck in traffic, and how the GIL explains the behavior.