Time Series Forecasting with Python — Deep Dive
Stationarity and why it matters
Most statistical forecasting methods assume stationarity — the statistical properties (mean, variance) do not change over time. Non-stationary series must be transformed before modeling.
Testing for stationarity
from statsmodels.tsa.stattools import adfuller, kpss
def stationarity_tests(series: pd.Series) -> dict:
"""Run ADF and KPSS tests; interpret jointly."""
adf_stat, adf_p, *_ = adfuller(series.dropna())
kpss_stat, kpss_p, *_ = kpss(series.dropna(), regression="c")
return {
"adf_statistic": adf_stat,
"adf_p_value": adf_p,
"adf_stationary": adf_p < 0.05,
"kpss_statistic": kpss_stat,
"kpss_p_value": kpss_p,
"kpss_stationary": kpss_p > 0.05,
}
When ADF says stationary but KPSS says non-stationary, the series is likely trend-stationary (stationary after removing a deterministic trend). When both say non-stationary, differencing is needed.
Fractional differencing for financial data
Standard integer differencing (d=1) removes too much memory from price series, destroying useful long-range autocorrelation. Fractional differencing preserves more signal:
import numpy as np
import pandas as pd
def frac_diff_weights(d: float, size: int, threshold: float = 1e-5) -> np.ndarray:
"""Compute weights for fractional differencing."""
weights = [1.0]
for k in range(1, size):
w = -weights[-1] * (d - k + 1) / k
if abs(w) < threshold:
break
weights.append(w)
return np.array(weights[::-1])
def frac_diff(series: pd.Series, d: float = 0.4) -> pd.Series:
"""Apply fractional differencing of order d."""
weights = frac_diff_weights(d, len(series))
width = len(weights)
result = pd.Series(index=series.index, dtype=float)
for i in range(width - 1, len(series)):
window = series.iloc[i - width + 1 : i + 1].values
result.iloc[i] = np.dot(weights, window)
return result
Choose d as the minimum value that makes the series stationary (ADF p-value < 0.05). Typically d is between 0.3 and 0.6 for financial returns.
GARCH for volatility forecasting
ARIMA forecasts the level of a series; GARCH forecasts its volatility. Together they model both the expected return and its uncertainty:
from arch import arch_model
def fit_garch(returns: pd.Series) -> dict:
"""Fit GARCH(1,1) and forecast next-day volatility."""
model = arch_model(returns * 100, vol="Garch", p=1, q=1, dist="t")
result = model.fit(disp="off")
forecast = result.forecast(horizon=1)
next_day_vol = np.sqrt(forecast.variance.iloc[-1, 0]) / 100
return {
"omega": result.params["omega"],
"alpha": result.params["alpha[1]"],
"beta": result.params["beta[1]"],
"persistence": result.params["alpha[1]"] + result.params["beta[1]"],
"next_day_volatility": next_day_vol,
"aic": result.aic,
}
Persistence (alpha + beta) near 1.0 indicates volatility shocks decay slowly — a hallmark of financial markets. EGARCH and GJR-GARCH variants capture the leverage effect (negative returns increase volatility more than positive returns of equal magnitude).
Vector Autoregression for multivariate series
When multiple time series influence each other (GDP, interest rates, inflation), VAR models capture the interdependencies:
from statsmodels.tsa.api import VAR
def fit_var_model(data: pd.DataFrame, maxlags: int = 12) -> dict:
"""Fit VAR model with automatic lag selection."""
model = VAR(data)
# Select optimal lag order
lag_order = model.select_order(maxlags=maxlags)
optimal_lag = lag_order.selected_orders["aic"]
result = model.fit(optimal_lag)
# Granger causality tests
causality = {}
for col in data.columns:
test = result.test_causality(col, causing=data.columns.drop(col).tolist())
causality[col] = test.pvalue
return {
"optimal_lag": optimal_lag,
"aic": result.aic,
"granger_causality_p_values": causality,
"model": result,
}
Granger causality tests reveal whether one series helps predict another beyond what the series predicts itself. This is particularly useful for identifying leading indicators.
Deep learning architectures
LSTM with attention
import torch
import torch.nn as nn
class LSTMForecaster(nn.Module):
def __init__(self, input_size: int, hidden_size: int = 64, num_layers: int = 2, horizon: int = 1):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=0.2)
self.attention = nn.Linear(hidden_size, 1)
self.fc = nn.Linear(hidden_size, horizon)
def forward(self, x):
# x: (batch, seq_len, features)
lstm_out, _ = self.lstm(x) # (batch, seq_len, hidden)
# Attention weights
attn_weights = torch.softmax(self.attention(lstm_out), dim=1)
context = (lstm_out * attn_weights).sum(dim=1)
return self.fc(context)
Temporal Fusion Transformer
The Temporal Fusion Transformer (TFT) by Google handles multiple input types — static metadata (stock sector), known future inputs (day of week), and observed past values — through gated residual networks and multi-head attention:
from pytorch_forecasting import TemporalFusionTransformer, TimeSeriesDataSet
# TimeSeriesDataSet handles windowing, normalization, and batching
training = TimeSeriesDataSet(
data,
time_idx="time_idx",
target="value",
group_ids=["series_id"],
max_encoder_length=60,
max_prediction_length=20,
time_varying_known_reals=["day_of_week", "month"],
time_varying_unknown_reals=["value", "volume"],
)
model = TemporalFusionTransformer.from_dataset(
training,
hidden_size=32,
attention_head_size=4,
dropout=0.1,
learning_rate=0.001,
)
TFT provides interpretable attention weights showing which past time steps and features matter most for each prediction — a rare combination of accuracy and explainability.
Ensemble forecasting
No single model dominates across all series. Combining forecasts reduces variance:
import numpy as np
import pandas as pd
def ensemble_forecast(
forecasts: dict[str, np.ndarray],
weights: dict[str, float] | None = None,
) -> np.ndarray:
"""Weighted ensemble of multiple forecast models."""
if weights is None:
weights = {k: 1.0 / len(forecasts) for k in forecasts}
total_weight = sum(weights.values())
result = np.zeros_like(list(forecasts.values())[0])
for name, forecast in forecasts.items():
result += forecast * weights[name] / total_weight
return result
def optimize_ensemble_weights(
forecasts: dict[str, np.ndarray],
actuals: np.ndarray,
) -> dict[str, float]:
"""Find weights that minimize MAE on validation data."""
from scipy.optimize import minimize
names = list(forecasts.keys())
forecast_matrix = np.column_stack([forecasts[n] for n in names])
def objective(w):
combined = forecast_matrix @ w
return np.mean(np.abs(combined - actuals))
n = len(names)
constraints = [{"type": "eq", "fun": lambda w: np.sum(w) - 1}]
bounds = [(0, 1)] * n
result = minimize(objective, x0=np.ones(n) / n, method="SLSQP",
bounds=bounds, constraints=constraints)
return dict(zip(names, result.x))
The M4 and M5 forecasting competitions consistently show ensembles outperforming individual models.
Forecast uncertainty quantification
Point forecasts without confidence intervals are dangerous. Two approaches:
Quantile regression
import lightgbm as lgb
def quantile_forecast(X_train, y_train, X_test, quantiles=(0.1, 0.5, 0.9)):
"""Produce prediction intervals via quantile regression."""
predictions = {}
for q in quantiles:
model = lgb.LGBMRegressor(objective="quantile", alpha=q, n_estimators=200)
model.fit(X_train, y_train)
predictions[f"q{int(q*100)}"] = model.predict(X_test)
return pd.DataFrame(predictions)
Conformal prediction
Conformal prediction provides distribution-free prediction intervals with guaranteed coverage:
def conformal_interval(
cal_predictions: np.ndarray,
cal_actuals: np.ndarray,
test_predictions: np.ndarray,
alpha: float = 0.1,
) -> tuple[np.ndarray, np.ndarray]:
"""
Conformal prediction interval with (1-alpha) coverage guarantee.
"""
residuals = np.abs(cal_actuals - cal_predictions)
quantile = np.quantile(residuals, 1 - alpha)
lower = test_predictions - quantile
upper = test_predictions + quantile
return lower, upper
Production deployment patterns
Automated retraining pipeline
from datetime import datetime, timedelta
class ForecastPipeline:
def __init__(self, model_factory, retrain_interval_days: int = 7):
self.model_factory = model_factory
self.retrain_interval = timedelta(days=retrain_interval_days)
self.last_trained = None
self.model = None
def predict(self, data: pd.DataFrame, horizon: int) -> dict:
if self.model is None or datetime.now() - self.last_trained > self.retrain_interval:
self.model = self.model_factory()
self.model.fit(data)
self.last_trained = datetime.now()
forecast = self.model.predict(horizon)
return {
"forecast": forecast,
"model_age_hours": (datetime.now() - self.last_trained).total_seconds() / 3600,
"training_samples": len(data),
}
Monitoring forecast quality
Track forecast error over time. A sudden increase in error signals regime change and should trigger model retraining or investigation:
- Rolling MAE: compute MAE over the most recent N predictions.
- Cumulative sum of errors (CUSUM): detects persistent directional bias.
- Alert thresholds: if rolling MAE exceeds 2× the historical baseline, flag for review.
The one thing to remember: Production time series forecasting combines domain-appropriate models (statistical for interpretability, deep learning for complex patterns), ensemble methods for robustness, and rigorous uncertainty quantification — with continuous monitoring to detect when the world changes faster than the model adapts.
See Also
- Python Backtesting Trading Strategies Why traders use Python to test their ideas on old data before risking real money, in plain language.
- Python Fraud Detection Patterns How Python helps banks and companies catch cheaters and thieves before they get away with it.
- Python Portfolio Optimization How Python helps you pick the right mix of investments so you get the best return for the risk you are willing to take.
- Python Quantitative Finance How Python helps people use math and data to make smarter money decisions, explained without any jargon.
- Python Risk Analysis Monte Carlo How rolling a virtual dice thousands of times helps investors understand what could go wrong with their money.