NumPy Structured Arrays — Core Concepts
Why this topic matters
Most NumPy tutorials focus on homogeneous arrays of floats or ints. But real data — sensor logs, database exports, binary file formats — mixes types. Structured arrays let you define a custom dtype with named fields of different types, giving you fast, typed, tabular storage without leaving NumPy.
How to create a structured array
Define a dtype as a list of (name, type) tuples:
import numpy as np
dt = np.dtype([('name', 'U20'), ('age', 'i4'), ('score', 'f8')])
students = np.array([
('Alice', 22, 91.5),
('Bob', 19, 85.0),
('Carol', 21, 93.2),
], dtype=dt)
Each element of students is a record. Access fields by name:
print(students['name']) # ['Alice' 'Bob' 'Carol']
print(students['score']) # [91.5 85. 93.2]
Field access returns views
When you access a single field, NumPy returns a view into the original data — no copy. Modifying the field view modifies the structured array:
scores = students['score']
scores[0] = 99.0
print(students[0]) # ('Alice', 22, 99.0) — changed in place
This is efficient but requires awareness. If you need an independent copy, call .copy() explicitly.
Sorting and filtering
Structured arrays support sorting by field name:
sorted_by_age = np.sort(students, order='age')
Boolean filtering works the same as regular arrays:
high_scorers = students[students['score'] > 90]
Record arrays — dot access
np.recarray wraps structured arrays to allow attribute-style access:
rec = students.view(np.recarray)
print(rec.name) # ['Alice' 'Bob' 'Carol']
print(rec.age) # [22 19 21]
This is convenient but slightly slower due to attribute lookup overhead. For performance-critical code, use dictionary-style field access.
Common misconception
People often assume structured arrays are the same as pandas DataFrames. They serve different purposes. Structured arrays are for low-level, fixed-schema, high-performance data storage. DataFrames add indexing, missing value handling, groupby, and dozens of other features. If you need those, use pandas. If you need raw speed with millions of fixed-format records, structured arrays win.
Nested dtypes
Fields can contain sub-fields or fixed-size sub-arrays:
dt = np.dtype([
('label', 'U10'),
('readings', 'f4', (5,)), # each record holds 5 floats
])
data = np.zeros(3, dtype=dt)
data['readings'][0] = [1.1, 2.2, 3.3, 4.4, 5.5]
print(data['readings'].shape) # (3, 5) — a regular 2D array
This is useful for fixed-size vector fields like RGB colors, 3D coordinates, or multi-channel sensor readings.
When to use structured arrays
- Reading binary file formats (HDF5, FITS, custom protocols)
- Interfacing with C structs via
ctypesorcffi - Processing sensor data with fixed record layouts
- Memory-mapped files where each record has a known byte layout
- High-volume logging where row count matters more than feature richness
The one thing to remember: Structured arrays combine the speed of NumPy with the flexibility of mixed types — they are your bridge between raw binary data and Python.
See Also
- Python Bokeh Get an intuitive feel for Bokeh so Python behavior stops feeling unpredictable.
- Python Numpy Advanced Indexing How to cherry-pick exactly the data you want from a NumPy array using lists, masks, and fancy tricks.
- Python Numpy Broadcasting Rules How NumPy magically makes different-sized arrays work together without you writing any loops.
- Python Numpy Einsum One tiny function that replaces dozens of NumPy operations — once you learn its shorthand, array math becomes a breeze.
- Python Numpy Fft Spectral How NumPy breaks apart a signal into its hidden frequencies — like separating a chord into individual notes.