Risk Analysis with Monte Carlo in Python — Deep Dive
Beyond normal distributions
The simplest Monte Carlo assumes normally distributed returns, but real financial returns exhibit fat tails and skew. Upgrading the distribution model dramatically improves tail risk estimates.
Student-t distribution
The Student-t distribution has heavier tails than the normal, controlled by the degrees of freedom parameter. Lower degrees of freedom mean fatter tails:
import numpy as np
from scipy.stats import t as student_t
def fit_student_t(returns: np.ndarray) -> tuple[float, float, float]:
"""Fit Student-t distribution and return (df, loc, scale)."""
params = student_t.fit(returns)
return params # (df, loc, scale)
def simulate_t_returns(df: float, loc: float, scale: float, shape: tuple) -> np.ndarray:
"""Generate random returns from a Student-t distribution."""
return student_t.rvs(df, loc=loc, scale=scale, size=shape)
Empirical studies find that equity returns typically fit with 4–6 degrees of freedom, meaning extreme moves are 3–5× more likely than a normal distribution predicts.
GARCH volatility clustering
The Generalized Autoregressive Conditional Heteroskedasticity model captures the empirical fact that large moves cluster together:
import numpy as np
def simulate_garch_returns(
n_days: int,
n_sims: int,
mu: float = 0.0003,
omega: float = 0.00001,
alpha: float = 0.09,
beta: float = 0.90,
) -> np.ndarray:
"""
Simulate returns with GARCH(1,1) volatility dynamics.
sigma²_t = omega + alpha * r²_{t-1} + beta * sigma²_{t-1}
"""
returns = np.zeros((n_sims, n_days))
sigma2 = np.full(n_sims, omega / (1 - alpha - beta)) # unconditional variance
for day in range(n_days):
z = np.random.standard_normal(n_sims)
returns[:, day] = mu + np.sqrt(sigma2) * z
sigma2 = omega + alpha * returns[:, day] ** 2 + beta * sigma2
return returns
GARCH-based Monte Carlo produces volatility regimes naturally — quiet periods followed by turbulent ones — which is critical for realistic drawdown estimation.
Correlated multi-asset simulation
Portfolio risk depends on correlations between assets. The standard approach uses Cholesky decomposition to generate correlated random variables:
import numpy as np
import pandas as pd
def simulate_correlated_returns(
mean_returns: np.ndarray,
cov_matrix: np.ndarray,
n_days: int = 252,
n_sims: int = 10_000,
) -> np.ndarray:
"""
Generate correlated return paths for multiple assets.
Returns shape: (n_sims, n_days, n_assets)
"""
n_assets = len(mean_returns)
cholesky = np.linalg.cholesky(cov_matrix)
# Independent standard normals
z = np.random.standard_normal((n_sims, n_days, n_assets))
# Apply correlation structure
correlated = z @ cholesky.T
# Add mean returns
correlated += mean_returns
return correlated
def portfolio_var(
weights: np.ndarray,
simulated_returns: np.ndarray,
confidence: float = 0.95,
) -> dict:
"""Compute portfolio VaR and CVaR from simulated paths."""
# Portfolio returns per sim per day
port_returns = simulated_returns @ weights # (n_sims, n_days)
# Cumulative returns over the full period
cumulative = np.sum(port_returns, axis=1)
var = -np.percentile(cumulative, (1 - confidence) * 100)
cvar = -np.mean(cumulative[cumulative <= -var])
return {
"VaR": var,
"CVaR": cvar,
"median_return": np.median(cumulative),
"prob_loss": np.mean(cumulative < 0),
}
Dynamic correlations with DCC
Static correlations underestimate risk during crises. The Dynamic Conditional Correlation (DCC) model allows correlations to evolve over time. The arch library provides DCC estimation:
from arch import arch_model
# Fit univariate GARCH to each asset first
# Then use DCC to model time-varying correlations
# This feeds into scenario generation for more realistic crisis behavior
Variance reduction techniques
Naive Monte Carlo converges slowly — halving the estimation error requires 4× the simulations. Variance reduction techniques achieve the same accuracy with fewer simulations.
Antithetic variates
For every random draw Z, also use -Z. This guarantees the sample mean of the random inputs is exactly zero, reducing variance from the mean estimation:
def mc_antithetic(
initial: float,
mu: float,
sigma: float,
days: int,
n_sims: int,
) -> np.ndarray:
"""Monte Carlo with antithetic variates — effectively 2× simulations."""
z = np.random.standard_normal((n_sims // 2, days))
z_anti = np.vstack([z, -z]) # antithetic pairs
returns = mu + sigma * z_anti
paths = initial * np.exp(np.cumsum(returns, axis=1))
return paths
Stratified sampling
Divide the probability space into equal strata and sample from each, ensuring the tails are adequately represented:
from scipy.stats import norm
def stratified_normal(n_samples: int) -> np.ndarray:
"""Generate stratified normal samples for better tail coverage."""
u = (np.arange(n_samples) + np.random.uniform(size=n_samples)) / n_samples
return norm.ppf(u)
Importance sampling for tail risk
When estimating rare events (like 0.1% VaR), most simulations are wasted on normal outcomes. Importance sampling shifts the distribution toward the tail and corrects with likelihood ratios:
def importance_sampling_var(
mu: float,
sigma: float,
threshold: float,
n_sims: int = 100_000,
shift: float = -3.0,
) -> float:
"""
Estimate P(return < threshold) using importance sampling.
Shift the sampling distribution toward the tail for better precision.
"""
# Sample from shifted distribution
shifted_mu = mu + shift * sigma
samples = np.random.normal(shifted_mu, sigma, n_sims)
# Likelihood ratio correction
log_ratio = (
-0.5 * ((samples - mu) / sigma) ** 2
+ 0.5 * ((samples - shifted_mu) / sigma) ** 2
)
weights = np.exp(log_ratio)
# Weighted probability estimate
indicators = (samples < threshold).astype(float)
return np.mean(indicators * weights)
Scenario injection and stress testing
Pure random simulation misses specific historical crises. Hybrid approaches inject named scenarios alongside random ones:
def hybrid_simulation(
random_paths: np.ndarray,
stress_scenarios: dict[str, np.ndarray],
stress_weight: float = 0.05,
) -> np.ndarray:
"""
Combine random Monte Carlo paths with deterministic stress scenarios.
stress_scenarios: {"2008_crisis": array_of_daily_returns, ...}
"""
n_stress = int(len(random_paths) * stress_weight / (1 - stress_weight))
stress_arrays = []
for name, returns in stress_scenarios.items():
# Pad or trim to match path length
padded = np.zeros(random_paths.shape[1])
padded[: len(returns)] = returns
stress_arrays.extend([padded] * (n_stress // len(stress_scenarios)))
stress_matrix = np.array(stress_arrays)
return np.vstack([random_paths, stress_matrix])
This ensures the risk model accounts for 2008-style crashes, flash crashes, and pandemic selloffs even if the random draws never produce comparable events.
Production architecture
Computation at scale
A bank computing VaR across thousands of positions with 50,000 simulations and 10-day horizons faces billions of calculations. Production systems use:
- Vectorized NumPy for the simulation engine.
- Multiprocessing to split simulations across CPU cores.
- GPU acceleration via CuPy or PyTorch for massive parallelism.
import numpy as np
from concurrent.futures import ProcessPoolExecutor
def run_chunk(args):
seed, n_sims, params = args
rng = np.random.default_rng(seed)
z = rng.standard_normal((n_sims, params["days"]))
returns = params["mu"] + params["sigma"] * z
paths = params["initial"] * np.exp(np.cumsum(returns, axis=1))
return paths[:, -1] # final values only
def parallel_monte_carlo(total_sims: int, n_workers: int = 8, **params) -> np.ndarray:
chunk_size = total_sims // n_workers
args = [(42 + i, chunk_size, params) for i in range(n_workers)]
with ProcessPoolExecutor(max_workers=n_workers) as executor:
results = list(executor.map(run_chunk, args))
return np.concatenate(results)
Reproducibility
Financial regulators require that risk calculations be reproducible. Always:
- Use
np.random.default_rng(seed)instead of the legacynp.random.seed(). - Store the seed alongside results.
- Version-control the model parameters and code.
- Log the library versions used for each run.
Validation
Compare Monte Carlo VaR against:
- Historical simulation: non-parametric VaR from actual historical returns.
- Analytical VaR: parametric calculation assuming normality (as a sanity check).
- Backtesting: count how often actual losses exceeded the predicted VaR. If 5% VaR is breached 8% of the time, the model underestimates risk.
The Basel III traffic light system flags models that breach too often: green (0–4 exceptions in 250 days), yellow (5–9), red (10+).
The one thing to remember: Production Monte Carlo risk analysis requires realistic distribution models (fat tails, volatility clustering, dynamic correlations), variance reduction for computational efficiency, and rigorous backtesting to prove the model actually captures the risks it claims to measure.
See Also
- Python Portfolio Optimization How Python helps you pick the right mix of investments so you get the best return for the risk you are willing to take.
- Python Backtesting Trading Strategies Why traders use Python to test their ideas on old data before risking real money, in plain language.
- Python Fraud Detection Patterns How Python helps banks and companies catch cheaters and thieves before they get away with it.
- Python Quantitative Finance How Python helps people use math and data to make smarter money decisions, explained without any jargon.
- Python Technical Indicators What technical indicators are and how Python calculates them, explained like you have never seen a stock chart.