Python Buffer Protocol — Deep Dive

Implement the buffer protocol in C extensions, handle non-contiguous memory layouts, manage buffer lifecycle, and build zero-copy bridges between Python and native code.

The buffer protocol is CPython’s most performance-critical interop mechanism. Every framework that deals with large data — NumPy, Arrow, Pillow, PyTorch — depends on it for zero-copy data exchange. Understanding the protocol at the C level is essential for writing extensions that participate in this ecosystem.

The Py_buffer Structure

The complete buffer description:

typedef struct {
    void      *buf;          // Pointer to logical start of buffer
    PyObject  *obj;          // Owning object (keeps it alive)
    Py_ssize_t len;          // Total bytes (product of shape × itemsize)
    Py_ssize_t itemsize;     // Size of one element
    int        readonly;     // 0 = writable, 1 = read-only
    int        ndim;         // Number of dimensions
    char      *format;       // struct-style format string
    Py_ssize_t *shape;       // Array of dimension sizes
    Py_ssize_t *strides;     // Bytes between consecutive elements per dim
    Py_ssize_t *suboffsets;  // For PIL-style indirect arrays (usually NULL)
    void      *internal;     // Implementation-specific data
} Py_buffer;

Format Strings

Format strings follow the struct module conventions:

Format	C Type	Size
`b`	signed char	1
`B`	unsigned char	1
`h`	short	2
`i`	int	4
`l`	long	4/8
`q`	long long	8
`f`	float	4
`d`	double	8
`?`	_Bool	1

Implementing the Protocol in C

Producer: Exporting a Buffer

typedef struct {
    PyObject_HEAD
    double *data;
    Py_ssize_t rows;
    Py_ssize_t cols;
    Py_ssize_t shape[2];
    Py_ssize_t strides[2];
    int export_count;  // Track active exports
} MatrixObject;

static int Matrix_getbuffer(MatrixObject *self, Py_buffer *view, int flags) {
    if (flags & PyBUF_WRITABLE && self->readonly) {
        PyErr_SetString(PyExc_BufferError, "Object is not writable");
        return -1;
    }
    
    self->shape[0] = self->rows;
    self->shape[1] = self->cols;
    self->strides[0] = self->cols * sizeof(double);
    self->strides[1] = sizeof(double);
    
    view->obj = (PyObject*)self;
    Py_INCREF(self);
    view->buf = self->data;
    view->len = self->rows * self->cols * sizeof(double);
    view->itemsize = sizeof(double);
    view->readonly = 0;
    view->ndim = 2;
    view->format = "d";
    view->shape = self->shape;
    view->strides = self->strides;
    view->suboffsets = NULL;
    view->internal = NULL;
    
    self->export_count++;
    return 0;
}

static void Matrix_releasebuffer(MatrixObject *self, Py_buffer *view) {
    self->export_count--;
}

static PyBufferProcs Matrix_as_buffer = {
    .bf_getbuffer = (getbufferproc)Matrix_getbuffer,
    .bf_releasebuffer = (releasebufferproc)Matrix_releasebuffer,
};

Key Implementation Details

Export counting: Track how many buffers are currently exported. While any buffer is active, the object must not reallocate or free its memory. Raise BufferError if a resize is attempted while exports are active:

static PyObject* Matrix_resize(MatrixObject *self, PyObject *args) {
    if (self->export_count > 0) {
        PyErr_SetString(PyExc_BufferError,
            "cannot resize while buffer is exported");
        return NULL;
    }
    // ... perform resize ...
}

This is why bytearray raises BufferError if you try to resize it while a memoryview exists.

Format negotiation: The flags parameter tells you what the consumer needs:

Flag	Meaning
`PyBUF_SIMPLE`	Contiguous, read-only, no format
`PyBUF_WRITABLE`	Must be writable
`PyBUF_FORMAT`	Consumer wants format string
`PyBUF_ND`	Consumer wants shape
`PyBUF_STRIDES`	Consumer wants strides
`PyBUF_C_CONTIGUOUS`	Must be C-contiguous (row-major)
`PyBUF_F_CONTIGUOUS`	Must be Fortran-contiguous (column-major)

If you can’t satisfy the flags, set a BufferError and return -1.

Non-Contiguous Memory

Strided Arrays

A transposed view of a matrix has non-standard strides:

// Original 3×4 matrix (row-major):
// strides = [4*sizeof(double), sizeof(double)]
//         = [32, 8]

// Transposed view (same memory, swapped strides):
// shape   = [4, 3]
// strides = [8, 32]  // Column-major traversal

To access element [i][j] in strided memory:

char *ptr = (char*)buf + i * strides[0] + j * strides[1];
double value = *(double*)ptr;

Suboffsets (PIL-Style)

Some image libraries store rows as an array of pointers (each row may be at a different memory location). The suboffsets array handles this:

// suboffsets[0] = 0 means: dereference after applying stride[0]
// Effectively: row_ptr = *(void**)(buf + i * strides[0])
//              pixel = row_ptr + j * strides[1]

Most consumers (including NumPy) don’t support suboffsets and request PyBUF_INDIRECT or refuse non-NULL suboffsets.

Consumer: Reading a Buffer

static PyObject* sum_buffer(PyObject *self, PyObject *arg) {
    Py_buffer view;
    if (PyObject_GetBuffer(arg, &view, PyBUF_FORMAT | PyBUF_ND) < 0)
        return NULL;
    
    if (strcmp(view.format, "d") != 0) {
        PyErr_SetString(PyExc_TypeError, "Expected float64 buffer");
        PyBuffer_Release(&view);
        return NULL;
    }
    
    double total = 0.0;
    
    if (view.ndim == 1) {
        // Contiguous 1D case — fast path
        double *data = (double*)view.buf;
        for (Py_ssize_t i = 0; i < view.shape[0]; i++) {
            total += data[i];
        }
    } else if (view.ndim == 2 && view.strides) {
        // Strided 2D case
        for (Py_ssize_t i = 0; i < view.shape[0]; i++) {
            for (Py_ssize_t j = 0; j < view.shape[1]; j++) {
                char *ptr = (char*)view.buf + i * view.strides[0] + j * view.strides[1];
                total += *(double*)ptr;
            }
        }
    }
    
    PyBuffer_Release(&view);  // MUST release when done
    return PyFloat_FromDouble(total);
}

Critical: Always call PyBuffer_Release when done. Failing to release prevents the producer from resizing or freeing its memory.

Advanced: Buffer Protocol in Python (PEP 3118)

Python classes can implement the buffer protocol via __buffer__ and __release_buffer__ (Python 3.12+, PEP 688):

import ctypes

class SharedMemory:
    def __init__(self, size):
        self._buf = (ctypes.c_byte * size)()
        self._exports = 0
    
    def __buffer__(self, flags):
        self._exports += 1
        return memoryview(self._buf)
    
    def __release_buffer__(self, mv):
        self._exports -= 1

Before PEP 688, only C extensions could be buffer producers. Now pure Python classes can participate.

Performance Patterns

Zero-Copy Pipeline

import numpy as np
import struct

# Read binary data from file
with open("data.bin", "rb") as f:
    raw = f.read()  # bytes object

# Zero-copy view as float64 array
view = memoryview(raw).cast("d")

# Zero-copy NumPy array
arr = np.frombuffer(raw, dtype=np.float64)

# At this point, raw, view, and arr all point to the same memory

Avoiding Hidden Copies

# HIDDEN COPY: slicing bytes creates a new bytes object
chunk = data[1000:2000]  # Copies 1000 bytes

# ZERO COPY: slicing memoryview creates a view
view = memoryview(data)
chunk = view[1000:2000]  # No copy — same memory, different offset

Buffer Protocol with Socket I/O

buf = bytearray(65536)
view = memoryview(buf)
offset = 0

while offset < len(buf):
    nbytes = sock.recv_into(view[offset:])
    if nbytes == 0:
        break
    offset += nbytes

recv_into writes directly into the buffer without creating intermediate bytes objects.

Interaction with Other Protocols

Array Interface (`__array_interface__`)

NumPy’s older protocol, predating PEP 3118. Still supported for compatibility:

class MyArray:
    @property
    def __array_interface__(self):
        return {
            'shape': (100,),
            'typestr': '<f8',
            'data': (self._ptr, False),  # (address, readonly)
            'version': 3,
        }

DLPack

The modern cross-framework protocol (PEP 3118 for GPU memory):

# Share tensor between PyTorch and NumPy via DLPack
torch_tensor = torch.randn(1000)
np_array = np.from_dlpack(torch_tensor)  # Zero-copy

DLPack handles device memory (CPU, CUDA, ROCm) which the buffer protocol cannot.

One Thing to Remember

The buffer protocol is CPython’s contract for zero-copy memory sharing. Implementing it correctly — with proper export tracking, stride handling, and format negotiation — lets your extension participate in Python’s data ecosystem at native speed, moving gigabytes between libraries without copying a single byte.

pythonbuffer-protocolc-apizero-copymemoryviewstrides