Stationarity Testing in Python — Deep Dive
The unit root problem formally
Consider the AR(1) process: yₜ = ρyₜ₋₁ + εₜ
If |ρ| < 1, the process is stationary. If ρ = 1, the process is a random walk (unit root). The Dickey-Fuller test directly estimates ρ by rewriting the equation:
Δyₜ = (ρ − 1)yₜ₋₁ + εₜ = γyₜ₋₁ + εₜ
The null hypothesis is γ = 0 (unit root). Under the null, the test statistic does not follow a standard t-distribution — it follows the Dickey-Fuller distribution, which has heavier left tails.
The Augmented version (ADF)
The ADF test adds lagged differences to control for serial correlation:
Δyₜ = α + βt + γyₜ₋₁ + Σᵢ δᵢΔyₜ₋ᵢ + εₜ
Where α is a constant, βt is a time trend, and the lagged differences absorb autocorrelation so that εₜ is approximately white noise.
from statsmodels.tsa.stattools import adfuller
# With constant only (most common)
result_c = adfuller(series.dropna(), regression="c", autolag="AIC")
# With constant and trend
result_ct = adfuller(series.dropna(), regression="ct", autolag="AIC")
# No constant, no trend (rare)
result_n = adfuller(series.dropna(), regression="n", autolag="AIC")
The choice of regression specification matters:
"c"— appropriate when the series fluctuates around a non-zero mean."ct"— appropriate when you suspect a deterministic trend."n"— appropriate only when the series has zero mean under the alternative.
Using the wrong specification reduces test power. When in doubt, use "c".
Lag selection
The autolag="AIC" option selects the number of augmenting lags by minimizing AIC. Alternatives include "BIC" (tends to select fewer lags) and "t-stat" (removes lags until the last one is significant). BIC is preferred for large samples; AIC for small ones.
Phillips-Perron test
The PP test is an alternative to ADF that uses a non-parametric correction for serial correlation instead of adding lagged differences:
from arch.unitroot import PhillipsPerron
pp = PhillipsPerron(series.dropna(), lags=None, trend="c")
print(f"PP Statistic: {pp.stat:.4f}")
print(f"p-value: {pp.pvalue:.4f}")
PP is more robust to heteroscedasticity and serial correlation of unknown form, but it can have worse finite-sample performance than ADF. In practice, ADF and PP usually agree.
KPSS test mechanics
KPSS decomposes the series as: yₜ = ξt + rₜ + εₜ
Where rₜ = rₜ₋₁ + uₜ is a random walk component. Under the null of stationarity, the variance of uₜ is zero (the random walk component vanishes). The test statistic is based on the cumulative sum of residuals from a regression:
from statsmodels.tsa.stattools import kpss
# Test for level stationarity
stat_c, p_c, lags_c, crit_c = kpss(series.dropna(), regression="c", nlags="auto")
# Test for trend stationarity
stat_ct, p_ct, lags_ct, crit_ct = kpss(series.dropna(), regression="ct", nlags="auto")
The nlags="auto" uses the Schwert criterion for bandwidth selection. The truncated p-values (KPSS only reports p between 0.01 and 0.10) are a known limitation.
Advanced: Zivot-Andrews test for structural breaks
Standard tests assume no structural breaks. If the series has a one-time level shift or trend change, ADF may fail to reject the unit root null even when the series is stationary around a break:
from arch.unitroot import ZivotAndrews
za = ZivotAndrews(series.dropna(), method="both", lags=None)
print(f"ZA Statistic: {za.stat:.4f}")
print(f"p-value: {za.pvalue:.4f}")
print(f"Breakpoint: {za.breakpoint}") # index of detected break
The method parameter specifies what kind of break:
"const"— break in intercept only"trend"— break in trend only"both"— break in both intercept and trend
Variance stationarity and ARCH effects
The tests above focus on mean stationarity. A series can have a constant mean but time-varying variance (ARCH effects), which is common in financial returns:
from arch.unitroot import DFGLS
from statsmodels.stats.diagnostic import het_arch
# Test for mean stationarity with better power
dfgls = DFGLS(series.dropna(), trend="c", lags=None)
print(f"DF-GLS Statistic: {dfgls.stat:.4f}")
# Test for ARCH effects (variance non-stationarity)
resid = series.diff().dropna()
arch_stat, arch_p, _, _ = het_arch(resid, nlags=5)
print(f"ARCH test p-value: {arch_p:.4f}")
# p < 0.05 → variance is time-varying
DF-GLS is a more powerful version of ADF that detrends the data first using GLS, producing better finite-sample performance.
Comprehensive stationarity testing function
import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import adfuller, kpss
def comprehensive_stationarity_test(series, name="series"):
"""Run ADF + KPSS with joint interpretation."""
clean = series.dropna()
if len(clean) < 20:
return {"error": "Insufficient data (need ≥ 20 observations)"}
# ADF test
adf_stat, adf_p, adf_lags, adf_nobs, adf_crit, _ = adfuller(
clean, regression="c", autolag="AIC"
)
# KPSS test
kpss_stat, kpss_p, kpss_lags, kpss_crit = kpss(
clean, regression="c", nlags="auto"
)
adf_stationary = adf_p < 0.05
kpss_stationary = kpss_p > 0.05
# Joint interpretation
if adf_stationary and kpss_stationary:
conclusion = "STATIONARY"
action = "No transformation needed"
elif not adf_stationary and not kpss_stationary:
conclusion = "NON-STATIONARY"
action = "Apply differencing (d=1)"
elif adf_stationary and not kpss_stationary:
conclusion = "TREND-STATIONARY"
action = "Remove deterministic trend or use regression='ct' in KPSS"
else:
conclusion = "INCONCLUSIVE"
action = "Try larger sample, different lag selection, or PP test"
return {
"series": name,
"n_obs": len(clean),
"adf_statistic": round(adf_stat, 4),
"adf_p_value": round(adf_p, 4),
"adf_stationary": adf_stationary,
"kpss_statistic": round(kpss_stat, 4),
"kpss_p_value": round(kpss_p, 4),
"kpss_stationary": kpss_stationary,
"conclusion": conclusion,
"recommended_action": action,
}
Automated differencing pipeline
def auto_difference(series, max_d=2, max_D=1, seasonal_period=None):
"""Automatically determine and apply differencing."""
result = {"original_length": len(series)}
current = series.copy()
# Seasonal differencing first (if applicable)
D = 0
if seasonal_period and len(current) > 2 * seasonal_period:
for D_candidate in range(1, max_D + 1):
test_series = current.diff(seasonal_period).dropna()
kpss_stat, kpss_p, *_ = kpss(test_series, regression="c", nlags="auto")
if kpss_p > 0.05:
current = test_series
D = D_candidate
break
else:
current = current.diff(seasonal_period).dropna()
D = max_D
# Non-seasonal differencing
d = 0
for d_candidate in range(max_d + 1):
test_series = current.copy()
for _ in range(d_candidate):
test_series = test_series.diff().dropna()
adf_p = adfuller(test_series, autolag="AIC")[1]
if adf_p < 0.05:
for _ in range(d_candidate):
current = current.diff().dropna()
d = d_candidate
break
result.update({
"d": d,
"D": D,
"seasonal_period": seasonal_period,
"final_length": len(current),
"series": current,
})
return result
Pitfalls and edge cases
-
Sample size sensitivity — ADF has low power with short series (< 50 observations). The test may fail to reject the null even for stationary series. Consider using DF-GLS for better power.
-
Near-unit root processes — A series with ρ = 0.99 is technically stationary but behaves like a random walk over finite samples. Tests may give contradictory results.
-
Structural breaks — A stationary series with a level shift can appear non-stationary to ADF. Always plot the data and consider the Zivot-Andrews test.
-
Multiple testing — Running many stationarity tests increases the chance of false positives. If testing hundreds of series, apply Bonferroni or BH corrections.
-
Seasonal unit roots — Standard ADF does not detect seasonal unit roots. Use the OCSB or Canova-Hansen test from
pmdarima:
from pmdarima.arima import nsdiffs
D = nsdiffs(series, m=12, test="ocsb")
The one thing to remember: Stationarity testing is a statistical inference problem with real consequences for model selection — using the wrong test specification, ignoring structural breaks, or over-differencing can all lead to models that look good on paper but produce useless forecasts.
See Also
- Python Arima Forecasting How ARIMA models use patterns in past numbers to predict the future, explained like a bedtime story.
- Python Autocorrelation Analysis How today's number is connected to yesterday's, and why that connection is the secret weapon of time series analysis.
- Python Exponential Smoothing How exponential smoothing weighs recent events more heavily to predict what happens next, like trusting fresh memories more than old ones.
- Python Multivariate Time Series Why tracking multiple things at once gives you better predictions than tracking each one alone.
- Python Prophet Forecasting How Facebook's Prophet tool predicts the future by breaking data into easy-to-understand pieces.