NumPy Structured Arrays — Core Concepts

Build mixed-type datasets in NumPy with named fields, nested dtypes, and record arrays for fast tabular computation.

Why this topic matters

Most NumPy tutorials focus on homogeneous arrays of floats or ints. But real data — sensor logs, database exports, binary file formats — mixes types. Structured arrays let you define a custom dtype with named fields of different types, giving you fast, typed, tabular storage without leaving NumPy.

How to create a structured array

Define a dtype as a list of (name, type) tuples:

import numpy as np

dt = np.dtype([('name', 'U20'), ('age', 'i4'), ('score', 'f8')])
students = np.array([
    ('Alice', 22, 91.5),
    ('Bob', 19, 85.0),
    ('Carol', 21, 93.2),
], dtype=dt)

Each element of students is a record. Access fields by name:

print(students['name'])   # ['Alice' 'Bob' 'Carol']
print(students['score'])  # [91.5 85.  93.2]

Field access returns views

When you access a single field, NumPy returns a view into the original data — no copy. Modifying the field view modifies the structured array:

scores = students['score']
scores[0] = 99.0
print(students[0])  # ('Alice', 22, 99.0) — changed in place

This is efficient but requires awareness. If you need an independent copy, call .copy() explicitly.

Sorting and filtering

Structured arrays support sorting by field name:

sorted_by_age = np.sort(students, order='age')

Boolean filtering works the same as regular arrays:

high_scorers = students[students['score'] > 90]

Record arrays — dot access

np.recarray wraps structured arrays to allow attribute-style access:

rec = students.view(np.recarray)
print(rec.name)   # ['Alice' 'Bob' 'Carol']
print(rec.age)    # [22 19 21]

This is convenient but slightly slower due to attribute lookup overhead. For performance-critical code, use dictionary-style field access.

Common misconception

People often assume structured arrays are the same as pandas DataFrames. They serve different purposes. Structured arrays are for low-level, fixed-schema, high-performance data storage. DataFrames add indexing, missing value handling, groupby, and dozens of other features. If you need those, use pandas. If you need raw speed with millions of fixed-format records, structured arrays win.

Nested dtypes

Fields can contain sub-fields or fixed-size sub-arrays:

dt = np.dtype([
    ('label', 'U10'),
    ('readings', 'f4', (5,)),  # each record holds 5 floats
])
data = np.zeros(3, dtype=dt)
data['readings'][0] = [1.1, 2.2, 3.3, 4.4, 5.5]
print(data['readings'].shape)  # (3, 5) — a regular 2D array

This is useful for fixed-size vector fields like RGB colors, 3D coordinates, or multi-channel sensor readings.

When to use structured arrays

Reading binary file formats (HDF5, FITS, custom protocols)
Interfacing with C structs via ctypes or cffi
Processing sensor data with fixed record layouts
Memory-mapped files where each record has a known byte layout
High-volume logging where row count matters more than feature richness

The one thing to remember: Structured arrays combine the speed of NumPy with the flexibility of mixed types — they are your bridge between raw binary data and Python.

pythonnumpydata-science