Stationarity Testing in Python — Core Concepts
What stationarity actually means
A strictly stationary series has statistical properties that do not change over time. In practice, we usually test for weak stationarity (also called covariance stationarity), which requires:
- Constant mean — the average does not drift.
- Constant variance — the spread of the data stays the same.
- Autocovariance depends only on lag — the relationship between any two points depends only on their distance apart, not when they occurred.
The two main tests
Augmented Dickey-Fuller (ADF) test
The ADF test checks for a unit root — a mathematical property of non-stationary series.
- Null hypothesis: the series has a unit root (non-stationary)
- Alternative: the series is stationary
- Decision: reject the null (p-value < 0.05) → evidence of stationarity
from statsmodels.tsa.stattools import adfuller
result = adfuller(series.dropna(), autolag="AIC")
print(f"ADF Statistic: {result[0]:.4f}")
print(f"p-value: {result[1]:.4f}")
print(f"Lags used: {result[2]}")
print(f"Critical values: {result[4]}")
KPSS test
The KPSS test reverses the logic:
- Null hypothesis: the series is stationary (or trend-stationary)
- Alternative: the series has a unit root
- Decision: fail to reject (p-value > 0.05) → evidence of stationarity
from statsmodels.tsa.stattools import kpss
stat, p_value, n_lags, critical_values = kpss(series.dropna(), regression="c")
print(f"KPSS Statistic: {stat:.4f}")
print(f"p-value: {p_value:.4f}")
The regression parameter matters:
"c"tests for level stationarity (constant mean)"ct"tests for trend stationarity (stationary around a deterministic trend)
Why use both tests together
Running both tests reveals four possible outcomes:
| ADF result | KPSS result | Interpretation |
|---|---|---|
| Stationary | Stationary | Series is stationary ✓ |
| Non-stationary | Non-stationary | Series is non-stationary — difference it |
| Stationary | Non-stationary | Trend-stationary — remove deterministic trend |
| Non-stationary | Stationary | Inconclusive — collect more data or try other tests |
The combined approach catches cases that either test alone would miss.
Making non-stationary data stationary
Differencing
The most common transformation. First differencing replaces each value with the change from the previous value:
diff_1 = series.diff().dropna() # first difference
diff_2 = series.diff().diff().dropna() # second difference (rarely needed)
First differencing removes linear trends. Second differencing removes quadratic trends. Rarely go beyond d=2.
Log transformation
For series where variance grows with the level (multiplicative patterns), take the log first:
import numpy as np
log_series = np.log(series)
log_diff = log_series.diff().dropna()
This combination (log + difference) is extremely common in financial data analysis.
Seasonal differencing
For monthly data with yearly seasonality:
seasonal_diff = series.diff(12).dropna() # remove yearly seasonal pattern
You might need both seasonal and non-seasonal differencing for data with trend and seasonality.
Determining how many differences
A practical workflow:
def find_differencing_order(series, max_d=2):
"""Find the minimum differencing order for stationarity."""
for d in range(max_d + 1):
s = series.copy()
for _ in range(d):
s = s.diff().dropna()
adf_p = adfuller(s, autolag="AIC")[1]
if adf_p < 0.05:
return d
return max_d # fallback
The pmdarima library automates this with ndiffs() and nsdiffs():
from pmdarima.arima import ndiffs, nsdiffs
d = ndiffs(series, test="adf")
D = nsdiffs(series, m=12, test="ocsb") # seasonal differencing order
Common misconception
People often difference until the ADF p-value is tiny, but over-differencing is a real problem. It inflates noise, destroys signal, and leads to worse forecasts. The goal is the minimum differencing order that achieves stationarity — not the one that makes the test statistic most extreme.
The one thing to remember: Stationarity testing is not a formality — it directly determines how you transform your data before modeling, and using ADF and KPSS together gives you a much clearer picture than either test alone.
See Also
- Python Arima Forecasting How ARIMA models use patterns in past numbers to predict the future, explained like a bedtime story.
- Python Autocorrelation Analysis How today's number is connected to yesterday's, and why that connection is the secret weapon of time series analysis.
- Python Exponential Smoothing How exponential smoothing weighs recent events more heavily to predict what happens next, like trusting fresh memories more than old ones.
- Python Multivariate Time Series Why tracking multiple things at once gives you better predictions than tracking each one alone.
- Python Prophet Forecasting How Facebook's Prophet tool predicts the future by breaking data into easy-to-understand pieces.