Python Buffer Protocol — Deep Dive
The buffer protocol is CPython’s most performance-critical interop mechanism. Every framework that deals with large data — NumPy, Arrow, Pillow, PyTorch — depends on it for zero-copy data exchange. Understanding the protocol at the C level is essential for writing extensions that participate in this ecosystem.
The Py_buffer Structure
The complete buffer description:
typedef struct {
void *buf; // Pointer to logical start of buffer
PyObject *obj; // Owning object (keeps it alive)
Py_ssize_t len; // Total bytes (product of shape × itemsize)
Py_ssize_t itemsize; // Size of one element
int readonly; // 0 = writable, 1 = read-only
int ndim; // Number of dimensions
char *format; // struct-style format string
Py_ssize_t *shape; // Array of dimension sizes
Py_ssize_t *strides; // Bytes between consecutive elements per dim
Py_ssize_t *suboffsets; // For PIL-style indirect arrays (usually NULL)
void *internal; // Implementation-specific data
} Py_buffer;
Format Strings
Format strings follow the struct module conventions:
| Format | C Type | Size |
|---|---|---|
b | signed char | 1 |
B | unsigned char | 1 |
h | short | 2 |
i | int | 4 |
l | long | 4/8 |
q | long long | 8 |
f | float | 4 |
d | double | 8 |
? | _Bool | 1 |
Implementing the Protocol in C
Producer: Exporting a Buffer
typedef struct {
PyObject_HEAD
double *data;
Py_ssize_t rows;
Py_ssize_t cols;
Py_ssize_t shape[2];
Py_ssize_t strides[2];
int export_count; // Track active exports
} MatrixObject;
static int Matrix_getbuffer(MatrixObject *self, Py_buffer *view, int flags) {
if (flags & PyBUF_WRITABLE && self->readonly) {
PyErr_SetString(PyExc_BufferError, "Object is not writable");
return -1;
}
self->shape[0] = self->rows;
self->shape[1] = self->cols;
self->strides[0] = self->cols * sizeof(double);
self->strides[1] = sizeof(double);
view->obj = (PyObject*)self;
Py_INCREF(self);
view->buf = self->data;
view->len = self->rows * self->cols * sizeof(double);
view->itemsize = sizeof(double);
view->readonly = 0;
view->ndim = 2;
view->format = "d";
view->shape = self->shape;
view->strides = self->strides;
view->suboffsets = NULL;
view->internal = NULL;
self->export_count++;
return 0;
}
static void Matrix_releasebuffer(MatrixObject *self, Py_buffer *view) {
self->export_count--;
}
static PyBufferProcs Matrix_as_buffer = {
.bf_getbuffer = (getbufferproc)Matrix_getbuffer,
.bf_releasebuffer = (releasebufferproc)Matrix_releasebuffer,
};
Key Implementation Details
Export counting: Track how many buffers are currently exported. While any buffer is active, the object must not reallocate or free its memory. Raise BufferError if a resize is attempted while exports are active:
static PyObject* Matrix_resize(MatrixObject *self, PyObject *args) {
if (self->export_count > 0) {
PyErr_SetString(PyExc_BufferError,
"cannot resize while buffer is exported");
return NULL;
}
// ... perform resize ...
}
This is why bytearray raises BufferError if you try to resize it while a memoryview exists.
Format negotiation: The flags parameter tells you what the consumer needs:
| Flag | Meaning |
|---|---|
PyBUF_SIMPLE | Contiguous, read-only, no format |
PyBUF_WRITABLE | Must be writable |
PyBUF_FORMAT | Consumer wants format string |
PyBUF_ND | Consumer wants shape |
PyBUF_STRIDES | Consumer wants strides |
PyBUF_C_CONTIGUOUS | Must be C-contiguous (row-major) |
PyBUF_F_CONTIGUOUS | Must be Fortran-contiguous (column-major) |
If you can’t satisfy the flags, set a BufferError and return -1.
Non-Contiguous Memory
Strided Arrays
A transposed view of a matrix has non-standard strides:
// Original 3×4 matrix (row-major):
// strides = [4*sizeof(double), sizeof(double)]
// = [32, 8]
// Transposed view (same memory, swapped strides):
// shape = [4, 3]
// strides = [8, 32] // Column-major traversal
To access element [i][j] in strided memory:
char *ptr = (char*)buf + i * strides[0] + j * strides[1];
double value = *(double*)ptr;
Suboffsets (PIL-Style)
Some image libraries store rows as an array of pointers (each row may be at a different memory location). The suboffsets array handles this:
// suboffsets[0] = 0 means: dereference after applying stride[0]
// Effectively: row_ptr = *(void**)(buf + i * strides[0])
// pixel = row_ptr + j * strides[1]
Most consumers (including NumPy) don’t support suboffsets and request PyBUF_INDIRECT or refuse non-NULL suboffsets.
Consumer: Reading a Buffer
static PyObject* sum_buffer(PyObject *self, PyObject *arg) {
Py_buffer view;
if (PyObject_GetBuffer(arg, &view, PyBUF_FORMAT | PyBUF_ND) < 0)
return NULL;
if (strcmp(view.format, "d") != 0) {
PyErr_SetString(PyExc_TypeError, "Expected float64 buffer");
PyBuffer_Release(&view);
return NULL;
}
double total = 0.0;
if (view.ndim == 1) {
// Contiguous 1D case — fast path
double *data = (double*)view.buf;
for (Py_ssize_t i = 0; i < view.shape[0]; i++) {
total += data[i];
}
} else if (view.ndim == 2 && view.strides) {
// Strided 2D case
for (Py_ssize_t i = 0; i < view.shape[0]; i++) {
for (Py_ssize_t j = 0; j < view.shape[1]; j++) {
char *ptr = (char*)view.buf + i * view.strides[0] + j * view.strides[1];
total += *(double*)ptr;
}
}
}
PyBuffer_Release(&view); // MUST release when done
return PyFloat_FromDouble(total);
}
Critical: Always call PyBuffer_Release when done. Failing to release prevents the producer from resizing or freeing its memory.
Advanced: Buffer Protocol in Python (PEP 3118)
Python classes can implement the buffer protocol via __buffer__ and __release_buffer__ (Python 3.12+, PEP 688):
import ctypes
class SharedMemory:
def __init__(self, size):
self._buf = (ctypes.c_byte * size)()
self._exports = 0
def __buffer__(self, flags):
self._exports += 1
return memoryview(self._buf)
def __release_buffer__(self, mv):
self._exports -= 1
Before PEP 688, only C extensions could be buffer producers. Now pure Python classes can participate.
Performance Patterns
Zero-Copy Pipeline
import numpy as np
import struct
# Read binary data from file
with open("data.bin", "rb") as f:
raw = f.read() # bytes object
# Zero-copy view as float64 array
view = memoryview(raw).cast("d")
# Zero-copy NumPy array
arr = np.frombuffer(raw, dtype=np.float64)
# At this point, raw, view, and arr all point to the same memory
Avoiding Hidden Copies
# HIDDEN COPY: slicing bytes creates a new bytes object
chunk = data[1000:2000] # Copies 1000 bytes
# ZERO COPY: slicing memoryview creates a view
view = memoryview(data)
chunk = view[1000:2000] # No copy — same memory, different offset
Buffer Protocol with Socket I/O
buf = bytearray(65536)
view = memoryview(buf)
offset = 0
while offset < len(buf):
nbytes = sock.recv_into(view[offset:])
if nbytes == 0:
break
offset += nbytes
recv_into writes directly into the buffer without creating intermediate bytes objects.
Interaction with Other Protocols
Array Interface (__array_interface__)
NumPy’s older protocol, predating PEP 3118. Still supported for compatibility:
class MyArray:
@property
def __array_interface__(self):
return {
'shape': (100,),
'typestr': '<f8',
'data': (self._ptr, False), # (address, readonly)
'version': 3,
}
DLPack
The modern cross-framework protocol (PEP 3118 for GPU memory):
# Share tensor between PyTorch and NumPy via DLPack
torch_tensor = torch.randn(1000)
np_array = np.from_dlpack(torch_tensor) # Zero-copy
DLPack handles device memory (CPU, CUDA, ROCm) which the buffer protocol cannot.
One Thing to Remember
The buffer protocol is CPython’s contract for zero-copy memory sharing. Implementing it correctly — with proper export tracking, stride handling, and format negotiation — lets your extension participate in Python’s data ecosystem at native speed, moving gigabytes between libraries without copying a single byte.
See Also
- Python Boost Python Bindings Boost.Python lets C++ code talk to Python using clever C++ tricks, like teaching two people to understand each other through a shared phrasebook.
- Python Capsule Api Python Capsules let C extensions secretly pass pointers to each other through Python, like friends passing a sealed envelope through a mailbox.
- Python Cffi Bindings CFFI lets Python talk to fast C libraries, like giving your app a translator that speaks both languages at the same table.
- Python Extension Modules Api The C Extension API is how Python lets you plug in hand-built C code, like adding a turbo engine under your Python program's hood.
- Python Maturin Build Tool Maturin packages Rust code into Python libraries you can pip install, like a gift-wrapping service for super-fast code.