Data Model Customization — Deep Dive

Build production-quality custom types using Python's data model — from operator overloading internals to the C-level slot system and performance implications.

The C-Level Slot System

Python’s special methods aren’t just syntactic sugar. At the CPython level, each type object (PyTypeObject) has C-level function pointers called type slots. When you implement __add__, CPython copies a reference into the nb_add slot. When Python encounters +, it calls the slot directly — not through the normal attribute lookup chain.

This means:

Special methods on the class are used, never on the instance. Setting obj.__len__ = lambda: 42 has no effect on len(obj).
Looking up special methods bypasses __getattribute__ and __getattr__.
Metaclass special methods (like __instancecheck__) operate on the metaclass, not the class.

Key Slot Families

C Slot Group	Python Methods	Purpose
`tp_as_number`	`__add__`, `__mul__`, `__int__`, etc.	Numeric operations
`tp_as_sequence`	`__len__`, `__getitem__`, `__contains__`	Sequence protocol
`tp_as_mapping`	`__getitem__`, `__setitem__`, `__len__`	Mapping protocol
`tp_richcompare`	`__eq__`, `__lt__`, etc.	Rich comparison
`tp_hash`	`__hash__`	Hash value
`tp_call`	`__call__`	Callable
`tp_iter`	`__iter__`, `__next__`	Iterator protocol
`tp_repr` / `tp_str`	`__repr__`, `__str__`	String conversion

Building a Production-Quality Vector Type

Let’s build a Vector class that fully integrates with Python’s data model:

import math
import functools
import reprlib
from array import array


@functools.total_ordering
class Vector:
    """An n-dimensional vector supporting arithmetic and iteration."""

    typecode = 'd'  # double-precision float

    def __init__(self, components):
        self._components = array(self.typecode, components)

    # === Representation ===

    def __repr__(self):
        components = reprlib.repr(list(self._components))
        return f"Vector({components})"

    def __str__(self):
        return str(tuple(self._components))

    def __format__(self, fmt_spec):
        if fmt_spec.endswith('h'):  # Hyperspherical coordinates
            fmt_spec = fmt_spec[:-1]
            coords = self._angles()
            outer_fmt = '<{}>'.format
        else:
            coords = self._components
            outer_fmt = '({})'.format
        components = (format(c, fmt_spec) for c in coords)
        return outer_fmt(', '.join(components))

    # === Equality and Hashing ===

    def __eq__(self, other):
        if not isinstance(other, Vector):
            return NotImplemented
        return len(self) == len(other) and all(
            a == b for a, b in zip(self._components, other._components)
        )

    def __hash__(self):
        hashes = (hash(x) for x in self._components)
        return functools.reduce(lambda a, b: a ^ b, hashes, 0)

    # === Boolean ===

    def __bool__(self):
        return bool(abs(self))

    def __abs__(self):
        return math.sqrt(sum(x * x for x in self._components))

    # === Container Protocol ===

    def __len__(self):
        return len(self._components)

    def __getitem__(self, index):
        if isinstance(index, slice):
            return Vector(self._components[index])
        return self._components[index]

    def __iter__(self):
        return iter(self._components)

    def __contains__(self, value):
        return value in self._components

    # === Arithmetic ===

    def __add__(self, other):
        if isinstance(other, Vector):
            if len(self) != len(other):
                raise ValueError("Vectors must have same dimension")
            return Vector(a + b for a, b in zip(self, other))
        return NotImplemented

    def __radd__(self, other):
        return self + other

    def __mul__(self, scalar):
        if isinstance(scalar, (int, float)):
            return Vector(x * scalar for x in self)
        return NotImplemented

    def __rmul__(self, scalar):
        return self * scalar

    def __neg__(self):
        return Vector(-x for x in self)

    def __pos__(self):
        return Vector(self)

    # === Comparison ===

    def __lt__(self, other):
        if not isinstance(other, Vector):
            return NotImplemented
        return abs(self) < abs(other)

    # === Callable (dot product) ===

    def __matmul__(self, other):
        """Vector @ Vector = dot product (PEP 465)."""
        if isinstance(other, Vector):
            if len(self) != len(other):
                raise ValueError("Vectors must have same dimension")
            return sum(a * b for a, b in zip(self, other))
        return NotImplemented

    def __rmatmul__(self, other):
        return self @ other

Usage

v1 = Vector([3, 4, 5])
v2 = Vector([1, 2, 3])

len(v1)           # 3
abs(v1)           # 7.071...
v1 + v2           # Vector([4, 6, 8])
v1 * 2            # Vector([6, 8, 10])
2 * v1            # Vector([6, 8, 10]) — via __rmul__
v1 @ v2           # 26 (dot product via @)
v1[1]             # 4.0
v1[0:2]           # Vector([3, 4])
bool(Vector([0])) # False (zero vector)
hash(v1)          # Works — can use as dict key

The Reflected Method Protocol

When Python evaluates a + b:

1. If type(b) is a strict subclass of type(a):
     result = b.__radd__(a)
     if result is not NotImplemented: return result
2. result = a.__add__(b)
     if result is not NotImplemented: return result
3. result = b.__radd__(a)
     if result is not NotImplemented: return result
4. raise TypeError

The subclass priority (step 1) is crucial. It allows subclasses to override operations:

class Vector3D(Vector):
    def __add__(self, other):
        if len(other) != 3:
            raise ValueError("Vector3D requires 3 dimensions")
        return Vector3D(a + b for a, b in zip(self, other))

    def __radd__(self, other):
        return self.__add__(other)

# Vector3D's __radd__ is tried first because it's a subclass
v = Vector([1, 2, 3])
v3d = Vector3D([4, 5, 6])
result = v + v3d  # Calls v3d.__radd__(v) first → returns Vector3D

`__init_subclass__` for Data Model Hooks

Python 3.6+ lets base classes customize how subclasses are created:

class Serializable:
    _formats: dict[str, type] = {}

    def __init_subclass__(cls, /, format: str = "", **kwargs):
        super().__init_subclass__(**kwargs)
        if format:
            cls._format = format
            Serializable._formats[format] = cls

    def serialize(self):
        raise NotImplementedError

    @classmethod
    def deserialize(cls, data, format: str):
        return cls._formats[format]().parse(data)

Context Managers: `enter` and `exit`

The context manager protocol integrates with with statements:

class DatabaseTransaction:
    def __init__(self, connection):
        self.conn = connection

    def __enter__(self):
        self.conn.begin()
        return self.conn.cursor()

    def __exit__(self, exc_type, exc_val, exc_tb):
        if exc_type is None:
            self.conn.commit()
        else:
            self.conn.rollback()
        return False  # Don't suppress exceptions

The return value of __exit__ controls exception suppression. Returning True swallows the exception — use this rarely and deliberately.

Async Data Model

Python’s data model extends to async operations:

Method	Triggered By
`__aiter__`	`async for item in obj`
`__anext__`	Async iteration
`__aenter__`	`async with obj as x:`
`__aexit__`	End of `async with`
`__await__`	`await obj`

Performance Considerations

Slot-Based Dispatch Is Fast

Special method dispatch via C-level slots avoids the Python attribute lookup chain entirely. Calling len(obj) is faster than calling obj.length() because len() goes directly to sq_length without dictionary lookups.

Benchmark on CPython 3.12:

len(obj) via __len__: ~30ns
obj.get_length() regular method call: ~50ns

`slots` for Memory

For data model objects created in large numbers, combine with __slots__:

class Point:
    __slots__ = ('x', 'y')

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __add__(self, other):
        return Point(self.x + other.x, self.y + other.y)

    def __repr__(self):
        return f"Point({self.x}, {self.y})"

A Point with __slots__ uses ~56 bytes vs. ~152 bytes with a regular __dict__.

Common Pitfalls

Forgetting `NotImplemented`

# WRONG — raises TypeError for mixed types
def __add__(self, other):
    if not isinstance(other, Vector):
        raise TypeError(f"Can't add Vector and {type(other)}")

# RIGHT — allows Python to try the reflected method
def __add__(self, other):
    if not isinstance(other, Vector):
        return NotImplemented

Breaking the `eq`/`hash` Contract

Objects that compare equal must have equal hashes:

# If a == b, then hash(a) MUST equal hash(b)
# The reverse is NOT required (hash collisions are fine)

Implementing `del` (Don’t)

__del__ is a finalizer, not a destructor. It’s called at an unpredictable time (or never, if there’s a reference cycle). Use context managers (__enter__/__exit__) for resource cleanup instead.

One thing to remember: Python’s data model is a protocol system where special methods are hooks into the language runtime. They’re dispatched at the C level (not through normal attribute lookup), which makes them both fast and slightly different from regular methods. Master the data model, and you can make your objects indistinguishable from built-in types.

pythonadvancedoopinternals

Data Model Customization — Deep Dive

The C-Level Slot System

Key Slot Families

Building a Production-Quality Vector Type

Usage

The Reflected Method Protocol

__init_subclass__ for Data Model Hooks

Context Managers: __enter__ and __exit__

Async Data Model

Performance Considerations

Slot-Based Dispatch Is Fast

__slots__ for Memory

Common Pitfalls

Forgetting NotImplemented

Breaking the __eq__/__hash__ Contract

Implementing __del__ (Don’t)

See Also

Related Topics