Python Mypyc Compilation — Deep Dive
Compilation architecture
Mypyc’s compilation pipeline has four stages:
Stage 1: Mypy type analysis
Mypyc runs mypy’s full type-checking pass on your code. This produces a type-annotated AST where every expression has a known (or inferred) type. If mypy finds type errors, compilation fails — your code must be type-correct.
Stage 2: IR generation
The typed AST is lowered into mypyc’s intermediate representation (IR). This IR uses:
- Registers — typed temporary values (like CPU registers but at a higher level).
- Operations — primitive ops like
int_add,list_get,call_native. - Blocks — basic blocks connected by branches, forming a control flow graph.
# Source
def sum_list(nums: list[int]) -> int:
total: int = 0
for n in nums:
total += n
return total
# Simplified IR
block_entry:
r0 = 0 # total: int
r1 = get_iter(nums) # iterator
goto block_loop
block_loop:
r2 = next(r1) # StopIteration check
if exhausted goto block_return
r0 = int_add(r0, r2) # native int addition
goto block_loop
block_return:
return box(r0) # convert native int back to PyObject
The int_add operation compiles to a single machine instruction plus an overflow check — no PyNumber_Add dispatch, no PyObject allocation for intermediate results.
Stage 3: C code generation
The IR is translated to C code that uses CPython’s C API for Python interop but native C operations where types are known:
static PyObject *sum_list_impl(PyObject *nums) {
CPyTagged total = 0; // Native tagged integer
PyObject *iter = PyObject_GetIter(nums);
while (1) {
PyObject *item = PyIter_Next(iter);
if (item == NULL) break;
CPyTagged n = CPyTagged_FromObject(item);
Py_DECREF(item);
total = CPyTagged_Add(total, n); // Fast path for small ints
}
Py_DECREF(iter);
return CPyTagged_AsObject(total);
}
Stage 4: C compilation
Standard C compiler (GCC/Clang/MSVC) compiles the generated C into a shared library.
Native classes
When mypyc compiles a class, it replaces the Python __dict__-based attribute storage with a C struct:
# Source
class Point:
def __init__(self, x: float, y: float) -> None:
self.x = x
self.y = y
def distance(self, other: 'Point') -> float:
dx = self.x - other.x
dy = self.y - other.y
return (dx * dx + dy * dy) ** 0.5
Compiled, Point instances use a fixed C struct:
typedef struct {
PyObject_HEAD
double x; // Direct double, not PyObject*
double y;
} PointObject;
Accessing self.x compiles to a direct struct field access — a single memory load — instead of a dictionary lookup. This makes attribute-heavy code significantly faster.
The tradeoff: you cannot dynamically add attributes to compiled class instances. point.z = 3 raises AttributeError. This is similar to __slots__ but enforced at the C level.
Tagged integers
Mypyc uses a clever representation called tagged integers for int values:
- Small integers (fits in 63 bits on 64-bit systems) are stored as C
longvalues with the low bit set to 0. - Large integers (arbitrary precision) are stored as regular
PyObject*pointers with the low bit set to 1.
// Tagged integer representation
typedef intptr_t CPyTagged;
// Small int: value << 1 (low bit = 0)
// Big int: PyObject* | 1 (low bit = 1)
static inline CPyTagged CPyTagged_Add(CPyTagged a, CPyTagged b) {
if (likely(CPyTagged_IsShort(a) && CPyTagged_IsShort(b))) {
// Fast path: native addition with overflow check
CPyTagged result;
if (likely(!__builtin_add_overflow(a, b, &result))) {
return result;
}
}
// Slow path: fall back to PyLong operations
return CPyTagged_Add_slow(a, b);
}
This gives near-C-speed arithmetic for typical integer values while correctly handling Python’s unlimited-precision integers when needed.
Interop boundaries
Compiled and interpreted code can call each other, but there is a cost at the boundary:
# compiled_module.py (compiled with mypyc)
def fast_compute(data: list[int]) -> int:
# Runs at native speed
return sum(x * x for x in data)
# main.py (regular Python)
from compiled_module import fast_compute
result = fast_compute([1, 2, 3, 4, 5]) # Boundary crossing
At the boundary, mypyc generates wrapper functions that:
- Convert Python objects to native representations (unboxing).
- Call the native compiled function.
- Convert the result back to Python objects (boxing).
For hot paths, keep the entire call chain within compiled modules to avoid repeated boxing/unboxing.
Incremental compilation
For large projects, mypyc supports incremental compilation through setuptools:
# setup.py
from mypyc.build import mypycify
# Group related modules for optimal cross-module optimization
setup(
ext_modules=mypycify(
[
# Group 1: core (compiled together, can cross-optimize)
["mypackage/core.py", "mypackage/types.py"],
# Group 2: utils (separate compilation unit)
["mypackage/utils.py"],
# Group 3: API (depends on core, but compiled independently)
["mypackage/api.py"],
],
opt_level="3", # Maximum optimization
multi_file=True, # Separate .c files per module (faster incremental)
),
)
Modules in the same group can call each other with direct C function calls (no wrapper overhead). Modules in different groups go through the Python-level wrapper.
Real-world case study: mypy itself
Mypy uses mypyc to compile itself — the type checker is compiled with its own compiler. Results from the mypy project:
- 4x faster overall type-checking speed.
- Critical path speedup: type analysis loops that were 10x slower in pure Python.
- ~40 modules compiled out of ~200 total (only performance-critical ones).
- No code changes beyond ensuring type annotations were complete and correct.
The mypy team’s strategy:
- Profile to find hot modules.
- Ensure those modules pass
mypy --strict. - Add them to the mypyc build list.
- Benchmark before/after.
- Iterate.
Optimization techniques
Technique 1: Final declarations
from typing import Final
MAX_RETRIES: Final = 3 # Compiled as a C constant
BUFFER_SIZE: Final = 8192
class Config:
MAX_WORKERS: Final = 4 # Class constant, not instance attribute
Final values are inlined at compile time — no dictionary lookup, no attribute access.
Technique 2: Precise container types
# Slower — mypyc treats as generic dict
data: dict = {"a": 1}
# Faster — mypyc knows key and value types
data: dict[str, int] = {"a": 1}
Technique 3: Avoid dynamic patterns in compiled modules
# Bad for mypyc — dynamic attribute access
value = getattr(obj, attr_name)
# Good for mypyc — direct attribute access
value = obj.known_attribute
# Bad for mypyc — **kwargs
def func(**kwargs: Any) -> None: ...
# Good for mypyc — explicit parameters
def func(x: int, y: int, z: int = 0) -> None: ...
Technique 4: Use native operations
# Slower — generic Python built-in
total = sum(numbers)
# Faster in mypyc — explicit loop with typed accumulator
total: int = 0
for n in numbers:
total += n
Mypyc can optimize the explicit loop with native integer operations, while sum() is a call to a Python built-in that returns a generic PyObject.
Testing compiled modules
Compiled modules should produce identical results to interpreted ones. Test both:
# conftest.py
import importlib
import pytest
@pytest.fixture(params=["compiled", "interpreted"])
def module_mode(request, monkeypatch):
if request.param == "interpreted":
# Force Python fallback by removing the .so
import mypackage.core as mod
monkeypatch.setattr(mod, '__file__', mod.__file__)
# Or: test against a pure-Python copy
return request.param
In CI, run tests both with and without compilation:
- name: Test interpreted
run: pytest tests/
- name: Build compiled
run: pip install -e . # triggers mypyc
- name: Test compiled
run: pytest tests/
Limitations in practice
- No
__init_subclass__or__set_name__in compiled classes. - No class decorators that modify the class significantly.
- Generator functions compile but with limited optimization.
- Closures work but capture variables by reference, not value (same as Python).
- Error messages in compiled code may show C-level details in tracebacks.
For most projects, the practical approach is selective compilation — compile the 20% of modules responsible for 80% of runtime, leave the rest as regular Python.
One thing to remember: Mypyc’s magic is turning Python’s type system from a static analysis tool into a compilation strategy — tagged integers, native structs, and direct C calls emerge naturally from the type annotations you already write, delivering real speedups where it matters most.
See Also
- Python Appimage Distribution An AppImage is like a portable app on a USB stick — download one file, double-click it, and your Python program runs on any Linux computer without installing anything.
- Python Briefcase Native Apps Imagine a travel agent who repacks your suitcase for each country's customs — Briefcase converts your Python app into proper native packages for every platform.
- Python Flatpak Packaging Flatpak wraps your Python app in a safe bubble that works on every Linux system — like a snow globe that keeps your program perfect inside.
- Python Nuitka Compilation What if your Python code could run as fast as a race car instead of a bicycle? Nuitka translates Python into C to make that happen.
- Python Pex Executables Imagine zipping your entire Python project into a single magic file that runs anywhere Python lives — that's what PEX does.