Python Buffer Protocol — Core Concepts
The buffer protocol is a CPython-level interface that allows objects to expose their internal memory to other objects without copying. It is the foundation for efficient data interchange between Python’s built-in types, NumPy, and C extensions.
The Problem: Data Copying
Consider converting a NumPy array to bytes:
import numpy as np
arr = np.array([1.0, 2.0, 3.0, 4.0], dtype=np.float64)
# Without buffer protocol: copy all 32 bytes
raw_bytes = bytes(arr) # Creates a new copy in memory
# With buffer protocol: zero-copy view
view = memoryview(arr) # Points to arr's actual memory
For a 1GB array, the difference between copying and sharing is the difference between seconds of delay and instant access.
How It Works
An object that supports the buffer protocol implements two C-level methods:
bf_getbuffer: Fills aPy_bufferstruct with information about the memory.bf_releasebuffer: Releases the buffer when the consumer is done.
The Py_buffer struct describes:
| Field | Meaning |
|---|---|
buf | Pointer to the actual memory |
len | Total size in bytes |
itemsize | Size of each element |
format | Element type (e.g., "d" for double, "i" for int) |
ndim | Number of dimensions |
shape | Size along each dimension |
strides | Bytes to skip to reach the next element in each dimension |
readonly | Whether the memory can be modified |
memoryview: The Python Interface
memoryview is the standard way to use the buffer protocol from Python code:
data = bytearray(b"Hello, World!")
view = memoryview(data)
# Access individual bytes
print(view[0]) # 72 (ASCII 'H')
# Slice without copying
sub = view[7:12] # Points to "World" in original memory
sub[0] = ord('E')
print(data) # bytearray(b'Hello, Eorld!')
Changes through the memoryview affect the original object. No data is duplicated.
Typed Views
You can cast memoryviews to interpret memory as different types:
raw = bytearray(16)
int_view = memoryview(raw).cast('i') # View as 4-byte integers
int_view[0] = 42
int_view[1] = 100
# raw now contains the binary representation of 42 and 100
Which Objects Support It?
| Object | Read | Write |
|---|---|---|
bytes | ✅ | ❌ (immutable) |
bytearray | ✅ | ✅ |
array.array | ✅ | ✅ |
numpy.ndarray | ✅ | ✅ |
memoryview | ✅ | Depends on source |
str | ❌ | ❌ |
list | ❌ | ❌ |
Note: str and list do not support the buffer protocol. Lists contain pointers to objects, not contiguous data. Strings have internal encoding complexity that makes direct buffer access impractical.
Strides and Multi-Dimensional Arrays
Strides explain how to navigate multi-dimensional data in flat memory:
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6]], dtype=np.int32)
view = memoryview(arr)
print(view.shape) # (2, 3)
print(view.strides) # (12, 4) — 12 bytes per row, 4 bytes per element
Strides enable views like transpositions without moving data:
transposed = arr.T
t_view = memoryview(transposed)
print(t_view.strides) # (4, 12) — strides swapped, same memory
Real-World Usage Patterns
Sending Data Over a Network
import socket
data = bytearray(1024 * 1024) # 1MB buffer
view = memoryview(data)
sent = 0
while sent < len(data):
sent += sock.send(view[sent:]) # No copies during slicing
Sharing Between Libraries
import numpy as np
from PIL import Image
# NumPy array → PIL Image without copying
arr = np.random.randint(0, 255, (480, 640, 3), dtype=np.uint8)
img = Image.frombuffer("RGB", (640, 480), arr, "raw", "RGB", 0, 1)
Common Misconception
“Using memoryview is always faster than slicing bytes.” For small data (under ~1KB), the overhead of creating a memoryview object outweighs the copy savings. The buffer protocol shines with large data — megabytes and above — where avoiding copies makes a dramatic difference.
One Thing to Remember
The buffer protocol is Python’s mechanism for zero-copy data sharing. It lets objects expose their raw memory through a standard interface, enabling NumPy, Pandas, PIL, and dozens of other libraries to pass large datasets around without duplicating a single byte.
See Also
- Python Boost Python Bindings Boost.Python lets C++ code talk to Python using clever C++ tricks, like teaching two people to understand each other through a shared phrasebook.
- Python Capsule Api Python Capsules let C extensions secretly pass pointers to each other through Python, like friends passing a sealed envelope through a mailbox.
- Python Cffi Bindings CFFI lets Python talk to fast C libraries, like giving your app a translator that speaks both languages at the same table.
- Python Extension Modules Api The C Extension API is how Python lets you plug in hand-built C code, like adding a turbo engine under your Python program's hood.
- Python Maturin Build Tool Maturin packages Rust code into Python libraries you can pip install, like a gift-wrapping service for super-fast code.