NumPy Random Generator — Core Concepts
Why this topic matters
Random number generation underpins simulations, machine learning, statistical testing, and data augmentation. NumPy overhauled its random API in version 1.17, introducing Generator objects that are faster, more flexible, and safer for parallel code. Understanding the new API avoids subtle reproducibility bugs and global-state issues that plague the legacy interface.
Legacy vs modern API
Legacy (avoid in new code)
import numpy as np
np.random.seed(42)
data = np.random.randn(100)
Problems: np.random.seed() sets global state. Any library that also calls np.random functions shares that state. Order of imports, library internals, and even garbage collection can change the sequence unpredictably.
Modern (recommended)
rng = np.random.default_rng(42)
data = rng.standard_normal(100)
Each Generator owns its own state. No global side effects. Multiple generators can coexist independently.
Creating generators
# Seeded — reproducible
rng = np.random.default_rng(seed=12345)
# Unseeded — different every run (entropy from OS)
rng = np.random.default_rng()
# From a SeedSequence — best for spawning independent streams
from numpy.random import SeedSequence
ss = SeedSequence(42)
child_seeds = ss.spawn(4)
generators = [np.random.default_rng(s) for s in child_seeds]
SeedSequence is the key to parallel reproducibility. Spawning from one parent seed guarantees statistically independent streams without seed collisions.
Common distributions
| Method | Distribution | Example |
|---|---|---|
rng.random(n) | Uniform [0, 1) | Random probabilities |
rng.integers(lo, hi, n) | Discrete uniform | Dice rolls |
rng.standard_normal(n) | Normal (μ=0, σ=1) | Noise generation |
rng.normal(μ, σ, n) | Normal (custom) | Simulating measurements |
rng.exponential(scale, n) | Exponential | Wait times |
rng.choice(arr, n) | Sampling from array | Bootstrap resampling |
rng.shuffle(arr) | In-place permutation | Shuffling a dataset |
rng.permutation(n) | Random permutation | Index shuffling |
Common misconception
People think setting a seed guarantees identical results across different NumPy versions. It does not. NumPy explicitly does not guarantee cross-version stream compatibility for Generator. The same seed may produce different sequences in NumPy 1.24 vs 1.26. If you need exact reproducibility across versions, save the generated data rather than relying on the seed alone.
The legacy RandomState does guarantee stream compatibility — which is one reason it still exists.
Reproducibility best practices
- Always use explicit Generator objects — never rely on global state.
- Pass generators as function arguments — makes dependencies clear.
- Use SeedSequence.spawn() for parallel workloads.
- Log the seed alongside results so experiments can be replayed.
- Pin your NumPy version if exact stream reproducibility is critical.
def train_model(data, rng):
"""Accept an explicit generator — no hidden global state."""
idx = rng.permutation(len(data))
shuffled = data[idx]
# ... training logic
Performance
The default bit generator (PCG64) is significantly faster than the legacy Mersenne Twister:
| Operation (1M samples) | Legacy RandomState | Modern Generator |
|---|---|---|
| Uniform floats | 4.2 ms | 2.8 ms |
| Standard normal | 12.1 ms | 5.3 ms |
| Random integers | 6.8 ms | 3.1 ms |
The speedup comes from both the faster PCG64 algorithm and improved distribution sampling methods (e.g., Ziggurat for normals instead of Box-Muller).
The one thing to remember: Use np.random.default_rng(seed) instead of np.random.seed() — it is faster, safer, and gives you independent random streams without global-state headaches.
See Also
- Python Bokeh Get an intuitive feel for Bokeh so Python behavior stops feeling unpredictable.
- Python Numpy Advanced Indexing How to cherry-pick exactly the data you want from a NumPy array using lists, masks, and fancy tricks.
- Python Numpy Broadcasting Rules How NumPy magically makes different-sized arrays work together without you writing any loops.
- Python Numpy Einsum One tiny function that replaces dozens of NumPy operations — once you learn its shorthand, array math becomes a breeze.
- Python Numpy Fft Spectral How NumPy breaks apart a signal into its hidden frequencies — like separating a chord into individual notes.