NumPy Random Generator — Core Concepts

Why this topic matters

Random number generation underpins simulations, machine learning, statistical testing, and data augmentation. NumPy overhauled its random API in version 1.17, introducing Generator objects that are faster, more flexible, and safer for parallel code. Understanding the new API avoids subtle reproducibility bugs and global-state issues that plague the legacy interface.

Legacy vs modern API

Legacy (avoid in new code)

import numpy as np
np.random.seed(42)
data = np.random.randn(100)

Problems: np.random.seed() sets global state. Any library that also calls np.random functions shares that state. Order of imports, library internals, and even garbage collection can change the sequence unpredictably.

rng = np.random.default_rng(42)
data = rng.standard_normal(100)

Each Generator owns its own state. No global side effects. Multiple generators can coexist independently.

Creating generators

# Seeded — reproducible
rng = np.random.default_rng(seed=12345)

# Unseeded — different every run (entropy from OS)
rng = np.random.default_rng()

# From a SeedSequence — best for spawning independent streams
from numpy.random import SeedSequence
ss = SeedSequence(42)
child_seeds = ss.spawn(4)
generators = [np.random.default_rng(s) for s in child_seeds]

SeedSequence is the key to parallel reproducibility. Spawning from one parent seed guarantees statistically independent streams without seed collisions.

Common distributions

MethodDistributionExample
rng.random(n)Uniform [0, 1)Random probabilities
rng.integers(lo, hi, n)Discrete uniformDice rolls
rng.standard_normal(n)Normal (μ=0, σ=1)Noise generation
rng.normal(μ, σ, n)Normal (custom)Simulating measurements
rng.exponential(scale, n)ExponentialWait times
rng.choice(arr, n)Sampling from arrayBootstrap resampling
rng.shuffle(arr)In-place permutationShuffling a dataset
rng.permutation(n)Random permutationIndex shuffling

Common misconception

People think setting a seed guarantees identical results across different NumPy versions. It does not. NumPy explicitly does not guarantee cross-version stream compatibility for Generator. The same seed may produce different sequences in NumPy 1.24 vs 1.26. If you need exact reproducibility across versions, save the generated data rather than relying on the seed alone.

The legacy RandomState does guarantee stream compatibility — which is one reason it still exists.

Reproducibility best practices

  1. Always use explicit Generator objects — never rely on global state.
  2. Pass generators as function arguments — makes dependencies clear.
  3. Use SeedSequence.spawn() for parallel workloads.
  4. Log the seed alongside results so experiments can be replayed.
  5. Pin your NumPy version if exact stream reproducibility is critical.
def train_model(data, rng):
    """Accept an explicit generator — no hidden global state."""
    idx = rng.permutation(len(data))
    shuffled = data[idx]
    # ... training logic

Performance

The default bit generator (PCG64) is significantly faster than the legacy Mersenne Twister:

Operation (1M samples)Legacy RandomStateModern Generator
Uniform floats4.2 ms2.8 ms
Standard normal12.1 ms5.3 ms
Random integers6.8 ms3.1 ms

The speedup comes from both the faster PCG64 algorithm and improved distribution sampling methods (e.g., Ziggurat for normals instead of Box-Muller).

The one thing to remember: Use np.random.default_rng(seed) instead of np.random.seed() — it is faster, safer, and gives you independent random streams without global-state headaches.

pythonnumpydata-science

See Also