Python Extension Modules API — Deep Dive
Writing CPython extensions directly against the C API means working at the same level as CPython’s own built-in types. This gives unmatched control but demands precision with reference counting, type slots, and initialization protocols. This guide covers the patterns that separate correct extensions from subtly broken ones.
Custom Types via PyType_Spec
The modern approach (PEP 384 compatible) defines types using a slot array rather than filling a monolithic PyTypeObject:
typedef struct {
PyObject_HEAD
double x, y, z;
} VectorObject;
static PyObject* Vector_new(PyTypeObject* type, PyObject* args, PyObject* kwds) {
VectorObject* self = (VectorObject*)type->tp_alloc(type, 0);
if (self) { self->x = self->y = self->z = 0.0; }
return (PyObject*)self;
}
static int Vector_init(VectorObject* self, PyObject* args, PyObject* kwds) {
static char* kwlist[] = {"x", "y", "z", NULL};
if (!PyArg_ParseTupleAndKeywords(args, kwds, "|ddd", kwlist,
&self->x, &self->y, &self->z))
return -1;
return 0;
}
static PyObject* Vector_magnitude(VectorObject* self, PyObject* Py_UNUSED(ignored)) {
double mag = sqrt(self->x*self->x + self->y*self->y + self->z*self->z);
return PyFloat_FromDouble(mag);
}
static PyObject* Vector_repr(VectorObject* self) {
return PyUnicode_FromFormat("Vector(%.4f, %.4f, %.4f)",
self->x, self->y, self->z);
}
static PyMethodDef Vector_methods[] = {
{"magnitude", (PyCFunction)Vector_magnitude, METH_NOARGS, "Compute magnitude"},
{NULL}
};
static PyMemberDef Vector_members[] = {
{"x", Py_T_DOUBLE, offsetof(VectorObject, x), 0, "x component"},
{"y", Py_T_DOUBLE, offsetof(VectorObject, y), 0, "y component"},
{"z", Py_T_DOUBLE, offsetof(VectorObject, z), 0, "z component"},
{NULL}
};
static PyType_Slot Vector_slots[] = {
{Py_tp_new, Vector_new},
{Py_tp_init, Vector_init},
{Py_tp_repr, Vector_repr},
{Py_tp_methods, Vector_methods},
{Py_tp_members, Vector_members},
{Py_tp_doc, "3D vector type"},
{0, NULL}
};
static PyType_Spec Vector_spec = {
.name = "vecmodule.Vector",
.basicsize = sizeof(VectorObject),
.flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
.slots = Vector_slots,
};
Using PyType_Spec with slots is required for stable ABI compatibility and is the recommended approach for all new code.
Multi-Phase Initialization (PEP 489)
Single-phase initialization (PyModule_Create in PyInit_*) creates the module immediately. Multi-phase initialization defers creation, supporting sub-interpreters and per-interpreter module state:
static int vecmodule_exec(PyObject* module) {
PyObject* VectorType = PyType_FromSpec(&Vector_spec);
if (!VectorType) return -1;
if (PyModule_AddObjectRef(module, "Vector", VectorType) < 0) {
Py_DECREF(VectorType);
return -1;
}
Py_DECREF(VectorType);
return 0;
}
static PyModuleDef_Slot vecmodule_slots[] = {
{Py_mod_exec, vecmodule_exec},
{Py_mod_multiple_interpreters, Py_MOD_PER_INTERPRETER_GIL_SUPPORTED},
{0, NULL}
};
static struct PyModuleDef vecmodule = {
PyModuleDef_HEAD_INIT,
.m_name = "vecmodule",
.m_doc = "Vector math module",
.m_size = 0, // No per-module state
.m_methods = NULL,
.m_slots = vecmodule_slots,
};
PyMODINIT_FUNC PyInit_vecmodule(void) {
return PyModuleDef_Init(&vecmodule);
}
Multi-phase initialization is essential for Python 3.12+‘s per-interpreter GIL (PEP 684) and is required by the free-threading build (PEP 703).
Stable ABI and Limited API
What It Means
The Limited API (Py_LIMITED_API) is a subset of the C API guaranteed to be binary-compatible across Python versions. Extensions built against it don’t need recompilation when users upgrade Python.
#define Py_LIMITED_API 0x030c0000 // Target Python 3.12+
#include <Python.h>
What You Give Up
- No access to
PyObjectstruct internals (can’t readob_refcntdirectly) - No
PyTupleObject,PyListObjectinternal fields - Must use accessor functions (
PyTuple_GetItem,PyList_Size) - No
PyTypeObjectstruct fields — usePyType_Specslots instead
Build Configuration
Extension(
"mymodule",
sources=["mymodule.c"],
define_macros=[("Py_LIMITED_API", "0x030c0000")],
py_limited_api=True,
)
The resulting wheel gets a cp3x-abi3 tag instead of version-specific tags.
Reference Counting Patterns
The Ownership Protocol
// PATTERN 1: New reference — caller owns it
PyObject* result = PyLong_FromLong(42);
// Caller must eventually Py_DECREF(result)
// PATTERN 2: Borrowed reference — caller does NOT own it
PyObject* item = PyList_GetItem(list, 0); // Borrowed!
// Do NOT Py_DECREF(item) unless you Py_INCREF'd it first
// PATTERN 3: Stealing a reference — function takes ownership
PyList_SetItem(list, 0, PyLong_FromLong(99));
// PyList_SetItem steals the new reference — do NOT Py_DECREF the value
Safe Cleanup with goto
static PyObject* complex_function(PyObject* self, PyObject* args) {
PyObject* list = NULL;
PyObject* item = NULL;
PyObject* result = NULL;
list = PyList_New(0);
if (!list) goto error;
item = PyLong_FromLong(42);
if (!item) goto error;
if (PyList_Append(list, item) < 0) goto error;
Py_CLEAR(item); // Append doesn't steal; we're done with item
result = list;
list = NULL; // Transfer ownership to result
goto done;
error:
Py_XDECREF(list);
result = NULL;
done:
Py_XDECREF(item);
return result;
}
Py_CLEAR(ptr) decrements and sets to NULL atomically (prevents double-free). Py_XDECREF handles NULL safely.
Implementing Protocols
Number Protocol (Operator Overloading)
static PyObject* Vector_add(PyObject* left, PyObject* right) {
if (!PyObject_IsInstance(left, (PyObject*)&VectorType) ||
!PyObject_IsInstance(right, (PyObject*)&VectorType)) {
Py_RETURN_NOTIMPLEMENTED;
}
VectorObject* a = (VectorObject*)left;
VectorObject* b = (VectorObject*)right;
VectorObject* result = (VectorObject*)VectorType.tp_alloc(&VectorType, 0);
if (!result) return NULL;
result->x = a->x + b->x;
result->y = a->y + b->y;
result->z = a->z + b->z;
return (PyObject*)result;
}
static PyNumberMethods Vector_as_number = {
.nb_add = Vector_add,
};
Register via {Py_nb_add, Vector_add} in the slot array.
Sequence Protocol
static Py_ssize_t Vector_length(VectorObject* self) { return 3; }
static PyObject* Vector_item(VectorObject* self, Py_ssize_t i) {
switch (i) {
case 0: return PyFloat_FromDouble(self->x);
case 1: return PyFloat_FromDouble(self->y);
case 2: return PyFloat_FromDouble(self->z);
default:
PyErr_SetString(PyExc_IndexError, "index out of range");
return NULL;
}
}
Buffer Protocol
static int Vector_getbuffer(VectorObject* self, Py_buffer* view, int flags) {
view->obj = (PyObject*)self;
Py_INCREF(self);
view->buf = &self->x; // x, y, z are contiguous
view->len = 3 * sizeof(double);
view->itemsize = sizeof(double);
view->format = "d";
view->ndim = 1;
view->shape = (Py_ssize_t[]){3};
view->strides = (Py_ssize_t[]){sizeof(double)};
view->suboffsets = NULL;
view->readonly = 0;
return 0;
}
This lets NumPy create zero-copy views of Vector objects.
GIL and Threading
Releasing the GIL
static PyObject* compute_primes(PyObject* self, PyObject* args) {
long limit;
if (!PyArg_ParseTuple(args, "l", &limit)) return NULL;
Py_BEGIN_ALLOW_THREADS
// Pure C computation — no Python API calls allowed here
int* sieve = malloc(limit * sizeof(int));
// ... sieve of Eratosthenes ...
free(sieve);
Py_END_ALLOW_THREADS
// GIL is held again — safe to create Python objects
return PyLong_FromLong(count);
}
Thread State for Callbacks
void c_callback(void* userdata) {
PyGILState_STATE gstate = PyGILState_Ensure();
PyObject* callback = (PyObject*)userdata;
PyObject* result = PyObject_CallNoArgs(callback);
Py_XDECREF(result);
PyGILState_Release(gstate);
}
Debugging Extensions
sys.getrefcount(obj): Check reference counts from Python.python -X dev: Enable development mode (extra checks, warnings).- Valgrind: Detect memory leaks. Use
--suppressionswith CPython’s suppression file. Py_DEBUGbuild: Compile Python with debug mode for extra assertions (double-free detection, borrowed reference tracking).- Address Sanitizer: Compile your extension with
-fsanitize=addressto catch buffer overflows.
Performance Benchmarking
Compare a pure C API extension against pybind11 and ctypes for the same function:
| Approach | Call overhead (ns) | Suitable for |
|---|---|---|
| C API | ~30-50 | Hot paths called millions of times |
| pybind11 | ~80-150 | Most extensions |
| ctypes | ~500-1000 | Quick prototyping |
| cffi (API mode) | ~100-200 | Wrapping existing C libraries |
The C API wins on raw call overhead, but the difference only matters when the function body itself is very fast (nanosecond-scale).
One Thing to Remember
The C Extension API is CPython’s contract with native code. It’s verbose, demands manual reference counting, and punishes mistakes with segfaults — but it gives you direct access to everything Python is, at the speed of C. Every abstraction layer above it trades some of that control for safety and convenience.
See Also
- Python Boost Python Bindings Boost.Python lets C++ code talk to Python using clever C++ tricks, like teaching two people to understand each other through a shared phrasebook.
- Python Buffer Protocol The buffer protocol lets Python objects share raw memory without copying, like passing a notebook around the table instead of photocopying every page.
- Python Capsule Api Python Capsules let C extensions secretly pass pointers to each other through Python, like friends passing a sealed envelope through a mailbox.
- Python Cffi Bindings CFFI lets Python talk to fast C libraries, like giving your app a translator that speaks both languages at the same table.
- Python Maturin Build Tool Maturin packages Rust code into Python libraries you can pip install, like a gift-wrapping service for super-fast code.