NumPy Advanced Indexing — Deep Dive
Technical foundation
NumPy distinguishes between basic indexing (slices, integers, np.newaxis) and advanced indexing (integer arrays, boolean arrays). The distinction is not cosmetic — it controls whether the result shares memory with the source, how assignment works, and which C-level code path executes.
Advanced indexing triggers whenever an index is a non-tuple sequence, an ndarray of integer or boolean type, or a tuple containing such objects.
How fancy indexing works internally
When you write a[idx] with an integer array idx, NumPy:
- Validates that all values in
idxare within[-len(a), len(a)). - Wraps negative indices.
- Allocates a new output array with shape
idx.shape + a.shape[1:]. - Copies elements from
ato the output using gathered memory reads.
The gather operation is fundamentally different from strided access. Each element requires an independent memory lookup, which is why fancy indexing is slower than slicing.
Broadcasting in multi-axis fancy indexing
When multiple index arrays are provided, they are broadcast together before indexing:
a = np.arange(24).reshape(4, 6)
rows = np.array([0, 2, 3])[:, np.newaxis] # shape (3, 1)
cols = np.array([1, 4]) # shape (2,)
result = a[rows, cols] # shape (3, 2)
# Equivalent to: for each (r, c) in broadcast(rows, cols): result[...] = a[r, c]
This is how np.ix_ works under the hood — it reshapes index arrays so they broadcast to form a grid.
def my_ix(*args):
"""Simplified np.ix_ implementation."""
out = []
for i, a in enumerate(args):
shape = [1] * len(args)
shape[i] = len(a)
out.append(np.asarray(a).reshape(shape))
return tuple(out)
Boolean indexing internals
Boolean indexing is syntactic sugar for np.nonzero(mask) followed by fancy indexing:
a = np.array([10, 20, 30, 40, 50])
mask = np.array([True, False, True, False, True])
# These are equivalent:
result1 = a[mask]
result2 = a[np.nonzero(mask)]
The nonzero call extracts the indices of True values, producing a tuple of integer arrays. This means boolean indexing inherits all the performance characteristics of fancy indexing — plus the overhead of the nonzero scan.
For very sparse masks (few True values in a large array), np.nonzero followed by fancy indexing can be faster because fewer elements are gathered. For dense masks, the nonzero scan dominates and direct boolean indexing is cleaner.
Assignment semantics and np.add.at
Fancy index assignment has a subtle trap with duplicate indices:
a = np.zeros(5)
idx = np.array([1, 1, 1])
a[idx] += 1
print(a) # [0, 1, 0, 0, 0] — NOT [0, 3, 0, 0, 0]
The += creates a temporary, gathers values, adds one, then scatters back. When the same index appears multiple times, only the last write wins. To accumulate properly, use unbuffered operations:
a = np.zeros(5)
idx = np.array([1, 1, 1])
np.add.at(a, idx, 1)
print(a) # [0, 3, 0, 0, 0] — correct accumulation
np.add.at uses unbuffered iteration, applying each operation in sequence. The tradeoff: it is significantly slower than vectorized += because it cannot use SIMD optimizations.
For high-performance accumulation, np.bincount is often the better tool:
idx = np.array([1, 1, 1, 3, 3])
values = np.array([10, 20, 30, 40, 50])
result = np.bincount(idx, weights=values, minlength=5)
# [0, 60, 0, 90, 0] — vectorized accumulation
Advanced patterns
Argpartition for top-k selection
Instead of sorting an entire array to find the top 10 values:
data = np.random.randn(1_000_000)
# Full sort: O(n log n)
top10_slow = np.sort(data)[-10:]
# Argpartition: O(n) average
idx = np.argpartition(data, -10)[-10:]
top10_fast = data[idx] # fancy indexing to gather results
top10_fast = top10_fast[np.argsort(top10_fast)] # sort only the 10
Structured filtering with multiple conditions
# Select rows where column 0 > threshold AND column 2 < limit
table = np.random.randn(10000, 5)
mask = (table[:, 0] > 1.5) & (table[:, 2] < -0.5)
filtered = table[mask] # boolean indexing returns copy
Chaining conditions with & and | creates temporary boolean arrays. For many conditions, precompute:
conditions = np.ones(len(table), dtype=bool)
for col, op, val in [(0, np.greater, 1.5), (2, np.less, -0.5)]:
conditions &= op(table[:, col], val)
filtered = table[conditions]
Scatter-gather for permutations
Fancy indexing naturally expresses permutations:
data = np.array(['a', 'b', 'c', 'd', 'e'])
perm = np.array([4, 2, 0, 3, 1])
shuffled = data[perm] # gather: read in permuted order
inverse = np.empty_like(perm)
inverse[perm] = np.arange(5) # scatter: write to permuted positions
restored = shuffled[inverse] # round-trip back to original order
Performance benchmarks
Relative timings for selecting 1% of elements from a 10-million-element array:
| Method | Relative time | Notes |
|---|---|---|
| Contiguous slice | 1x | Baseline — stride-based, near-instant |
| Boolean mask | ~8x | nonzero scan + gather |
| Sorted fancy index | ~10x | Cache-friendly gather |
| Random fancy index | ~25x | Cache-hostile gather |
| Python loop | ~500x | Interpreter overhead |
The takeaway: fancy indexing is much slower than slicing but still 20-50x faster than Python loops. Sort your indices when possible.
Interaction with memory layout
Fancy indexing always produces C-contiguous output, regardless of the source array’s layout:
a = np.asfortranarray(np.arange(12).reshape(3, 4))
print(a.flags['F_CONTIGUOUS']) # True
b = a[[0, 2]]
print(b.flags['C_CONTIGUOUS']) # True — fancy indexing normalizes layout
This can be beneficial (downstream code gets contiguous data) or costly (forces a layout conversion during the copy).
Debugging tips
Use np.ndindex to verify fancy indexing results element by element:
def verify_fancy(a, *indices):
"""Brute-force check that a[indices] matches element-wise access."""
result = a[indices]
broadcast_idx = np.broadcast(*indices)
for pos in np.ndindex(result.shape):
idx_tuple = tuple(ind[pos] for ind in np.broadcast_arrays(*indices))
assert result[pos] == a[idx_tuple], f"Mismatch at {pos}"
print(f"Verified {result.size} elements ✓")
The one thing to remember: Advanced indexing is a gather/scatter operation — understanding that mental model (not views, not strides, but explicit element copying) prevents every common pitfall.
See Also
- Python Bokeh Get an intuitive feel for Bokeh so Python behavior stops feeling unpredictable.
- Python Numpy Broadcasting Rules How NumPy magically makes different-sized arrays work together without you writing any loops.
- Python Numpy Einsum One tiny function that replaces dozens of NumPy operations — once you learn its shorthand, array math becomes a breeze.
- Python Numpy Fft Spectral How NumPy breaks apart a signal into its hidden frequencies — like separating a chord into individual notes.
- Python Numpy Memory Views Why NumPy arrays can share the same data without copying it — and how that makes your code fast but occasionally surprising.