Python Extension Modules API — Deep Dive

Master CPython's C API: custom types with tp_slots, multi-phase initialization, stable ABI targeting, GIL-aware threading, and memory management patterns for production extensions.

Writing CPython extensions directly against the C API means working at the same level as CPython’s own built-in types. This gives unmatched control but demands precision with reference counting, type slots, and initialization protocols. This guide covers the patterns that separate correct extensions from subtly broken ones.

Custom Types via `PyType_Spec`

The modern approach (PEP 384 compatible) defines types using a slot array rather than filling a monolithic PyTypeObject:

typedef struct {
    PyObject_HEAD
    double x, y, z;
} VectorObject;

static PyObject* Vector_new(PyTypeObject* type, PyObject* args, PyObject* kwds) {
    VectorObject* self = (VectorObject*)type->tp_alloc(type, 0);
    if (self) { self->x = self->y = self->z = 0.0; }
    return (PyObject*)self;
}

static int Vector_init(VectorObject* self, PyObject* args, PyObject* kwds) {
    static char* kwlist[] = {"x", "y", "z", NULL};
    if (!PyArg_ParseTupleAndKeywords(args, kwds, "|ddd", kwlist,
                                      &self->x, &self->y, &self->z))
        return -1;
    return 0;
}

static PyObject* Vector_magnitude(VectorObject* self, PyObject* Py_UNUSED(ignored)) {
    double mag = sqrt(self->x*self->x + self->y*self->y + self->z*self->z);
    return PyFloat_FromDouble(mag);
}

static PyObject* Vector_repr(VectorObject* self) {
    return PyUnicode_FromFormat("Vector(%.4f, %.4f, %.4f)",
                                self->x, self->y, self->z);
}

static PyMethodDef Vector_methods[] = {
    {"magnitude", (PyCFunction)Vector_magnitude, METH_NOARGS, "Compute magnitude"},
    {NULL}
};

static PyMemberDef Vector_members[] = {
    {"x", Py_T_DOUBLE, offsetof(VectorObject, x), 0, "x component"},
    {"y", Py_T_DOUBLE, offsetof(VectorObject, y), 0, "y component"},
    {"z", Py_T_DOUBLE, offsetof(VectorObject, z), 0, "z component"},
    {NULL}
};

static PyType_Slot Vector_slots[] = {
    {Py_tp_new, Vector_new},
    {Py_tp_init, Vector_init},
    {Py_tp_repr, Vector_repr},
    {Py_tp_methods, Vector_methods},
    {Py_tp_members, Vector_members},
    {Py_tp_doc, "3D vector type"},
    {0, NULL}
};

static PyType_Spec Vector_spec = {
    .name = "vecmodule.Vector",
    .basicsize = sizeof(VectorObject),
    .flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
    .slots = Vector_slots,
};

Using PyType_Spec with slots is required for stable ABI compatibility and is the recommended approach for all new code.

Multi-Phase Initialization (PEP 489)

Single-phase initialization (PyModule_Create in PyInit_*) creates the module immediately. Multi-phase initialization defers creation, supporting sub-interpreters and per-interpreter module state:

static int vecmodule_exec(PyObject* module) {
    PyObject* VectorType = PyType_FromSpec(&Vector_spec);
    if (!VectorType) return -1;
    
    if (PyModule_AddObjectRef(module, "Vector", VectorType) < 0) {
        Py_DECREF(VectorType);
        return -1;
    }
    Py_DECREF(VectorType);
    return 0;
}

static PyModuleDef_Slot vecmodule_slots[] = {
    {Py_mod_exec, vecmodule_exec},
    {Py_mod_multiple_interpreters, Py_MOD_PER_INTERPRETER_GIL_SUPPORTED},
    {0, NULL}
};

static struct PyModuleDef vecmodule = {
    PyModuleDef_HEAD_INIT,
    .m_name = "vecmodule",
    .m_doc = "Vector math module",
    .m_size = 0,  // No per-module state
    .m_methods = NULL,
    .m_slots = vecmodule_slots,
};

PyMODINIT_FUNC PyInit_vecmodule(void) {
    return PyModuleDef_Init(&vecmodule);
}

Multi-phase initialization is essential for Python 3.12+‘s per-interpreter GIL (PEP 684) and is required by the free-threading build (PEP 703).

Stable ABI and Limited API

What It Means

The Limited API (Py_LIMITED_API) is a subset of the C API guaranteed to be binary-compatible across Python versions. Extensions built against it don’t need recompilation when users upgrade Python.

#define Py_LIMITED_API 0x030c0000  // Target Python 3.12+
#include <Python.h>

What You Give Up

No access to PyObject struct internals (can’t read ob_refcnt directly)
No PyTupleObject, PyListObject internal fields
Must use accessor functions (PyTuple_GetItem, PyList_Size)
No PyTypeObject struct fields — use PyType_Spec slots instead

Build Configuration

Extension(
    "mymodule",
    sources=["mymodule.c"],
    define_macros=[("Py_LIMITED_API", "0x030c0000")],
    py_limited_api=True,
)

The resulting wheel gets a cp3x-abi3 tag instead of version-specific tags.

Reference Counting Patterns

The Ownership Protocol

// PATTERN 1: New reference — caller owns it
PyObject* result = PyLong_FromLong(42);
// Caller must eventually Py_DECREF(result)

// PATTERN 2: Borrowed reference — caller does NOT own it
PyObject* item = PyList_GetItem(list, 0);  // Borrowed!
// Do NOT Py_DECREF(item) unless you Py_INCREF'd it first

// PATTERN 3: Stealing a reference — function takes ownership
PyList_SetItem(list, 0, PyLong_FromLong(99));
// PyList_SetItem steals the new reference — do NOT Py_DECREF the value

Safe Cleanup with goto

static PyObject* complex_function(PyObject* self, PyObject* args) {
    PyObject* list = NULL;
    PyObject* item = NULL;
    PyObject* result = NULL;
    
    list = PyList_New(0);
    if (!list) goto error;
    
    item = PyLong_FromLong(42);
    if (!item) goto error;
    
    if (PyList_Append(list, item) < 0) goto error;
    Py_CLEAR(item);  // Append doesn't steal; we're done with item
    
    result = list;
    list = NULL;  // Transfer ownership to result
    goto done;
    
error:
    Py_XDECREF(list);
    result = NULL;
    
done:
    Py_XDECREF(item);
    return result;
}

Py_CLEAR(ptr) decrements and sets to NULL atomically (prevents double-free). Py_XDECREF handles NULL safely.

Implementing Protocols

Number Protocol (Operator Overloading)

static PyObject* Vector_add(PyObject* left, PyObject* right) {
    if (!PyObject_IsInstance(left, (PyObject*)&VectorType) ||
        !PyObject_IsInstance(right, (PyObject*)&VectorType)) {
        Py_RETURN_NOTIMPLEMENTED;
    }
    VectorObject* a = (VectorObject*)left;
    VectorObject* b = (VectorObject*)right;
    
    VectorObject* result = (VectorObject*)VectorType.tp_alloc(&VectorType, 0);
    if (!result) return NULL;
    result->x = a->x + b->x;
    result->y = a->y + b->y;
    result->z = a->z + b->z;
    return (PyObject*)result;
}

static PyNumberMethods Vector_as_number = {
    .nb_add = Vector_add,
};

Sequence Protocol

static Py_ssize_t Vector_length(VectorObject* self) { return 3; }

static PyObject* Vector_item(VectorObject* self, Py_ssize_t i) {
    switch (i) {
        case 0: return PyFloat_FromDouble(self->x);
        case 1: return PyFloat_FromDouble(self->y);
        case 2: return PyFloat_FromDouble(self->z);
        default:
            PyErr_SetString(PyExc_IndexError, "index out of range");
            return NULL;
    }
}

Buffer Protocol

static int Vector_getbuffer(VectorObject* self, Py_buffer* view, int flags) {
    view->obj = (PyObject*)self;
    Py_INCREF(self);
    view->buf = &self->x;  // x, y, z are contiguous
    view->len = 3 * sizeof(double);
    view->itemsize = sizeof(double);
    view->format = "d";
    view->ndim = 1;
    view->shape = (Py_ssize_t[]){3};
    view->strides = (Py_ssize_t[]){sizeof(double)};
    view->suboffsets = NULL;
    view->readonly = 0;
    return 0;
}

This lets NumPy create zero-copy views of Vector objects.

GIL and Threading

Releasing the GIL

static PyObject* compute_primes(PyObject* self, PyObject* args) {
    long limit;
    if (!PyArg_ParseTuple(args, "l", &limit)) return NULL;
    
    Py_BEGIN_ALLOW_THREADS
    // Pure C computation — no Python API calls allowed here
    int* sieve = malloc(limit * sizeof(int));
    // ... sieve of Eratosthenes ...
    free(sieve);
    Py_END_ALLOW_THREADS
    
    // GIL is held again — safe to create Python objects
    return PyLong_FromLong(count);
}

Thread State for Callbacks

void c_callback(void* userdata) {
    PyGILState_STATE gstate = PyGILState_Ensure();
    
    PyObject* callback = (PyObject*)userdata;
    PyObject* result = PyObject_CallNoArgs(callback);
    Py_XDECREF(result);
    
    PyGILState_Release(gstate);
}

Debugging Extensions

sys.getrefcount(obj): Check reference counts from Python.
python -X dev: Enable development mode (extra checks, warnings).
Valgrind: Detect memory leaks. Use --suppressions with CPython’s suppression file.
Py_DEBUG build: Compile Python with debug mode for extra assertions (double-free detection, borrowed reference tracking).
Address Sanitizer: Compile your extension with -fsanitize=address to catch buffer overflows.

Performance Benchmarking

Compare a pure C API extension against pybind11 and ctypes for the same function:

Approach	Call overhead (ns)	Suitable for
C API	~30-50	Hot paths called millions of times
pybind11	~80-150	Most extensions
ctypes	~500-1000	Quick prototyping
cffi (API mode)	~100-200	Wrapping existing C libraries

The C API wins on raw call overhead, but the difference only matters when the function body itself is very fast (nanosecond-scale).

One Thing to Remember

The C Extension API is CPython’s contract with native code. It’s verbose, demands manual reference counting, and punishes mistakes with segfaults — but it gives you direct access to everything Python is, at the speed of C. Every abstraction layer above it trades some of that control for safety and convenience.