Stationarity Testing in Python — Core Concepts

What stationarity actually means

A strictly stationary series has statistical properties that do not change over time. In practice, we usually test for weak stationarity (also called covariance stationarity), which requires:

  1. Constant mean — the average does not drift.
  2. Constant variance — the spread of the data stays the same.
  3. Autocovariance depends only on lag — the relationship between any two points depends only on their distance apart, not when they occurred.

The two main tests

Augmented Dickey-Fuller (ADF) test

The ADF test checks for a unit root — a mathematical property of non-stationary series.

  • Null hypothesis: the series has a unit root (non-stationary)
  • Alternative: the series is stationary
  • Decision: reject the null (p-value < 0.05) → evidence of stationarity
from statsmodels.tsa.stattools import adfuller

result = adfuller(series.dropna(), autolag="AIC")

print(f"ADF Statistic: {result[0]:.4f}")
print(f"p-value: {result[1]:.4f}")
print(f"Lags used: {result[2]}")
print(f"Critical values: {result[4]}")

KPSS test

The KPSS test reverses the logic:

  • Null hypothesis: the series is stationary (or trend-stationary)
  • Alternative: the series has a unit root
  • Decision: fail to reject (p-value > 0.05) → evidence of stationarity
from statsmodels.tsa.stattools import kpss

stat, p_value, n_lags, critical_values = kpss(series.dropna(), regression="c")

print(f"KPSS Statistic: {stat:.4f}")
print(f"p-value: {p_value:.4f}")

The regression parameter matters:

  • "c" tests for level stationarity (constant mean)
  • "ct" tests for trend stationarity (stationary around a deterministic trend)

Why use both tests together

Running both tests reveals four possible outcomes:

ADF resultKPSS resultInterpretation
StationaryStationarySeries is stationary ✓
Non-stationaryNon-stationarySeries is non-stationary — difference it
StationaryNon-stationaryTrend-stationary — remove deterministic trend
Non-stationaryStationaryInconclusive — collect more data or try other tests

The combined approach catches cases that either test alone would miss.

Making non-stationary data stationary

Differencing

The most common transformation. First differencing replaces each value with the change from the previous value:

diff_1 = series.diff().dropna()  # first difference
diff_2 = series.diff().diff().dropna()  # second difference (rarely needed)

First differencing removes linear trends. Second differencing removes quadratic trends. Rarely go beyond d=2.

Log transformation

For series where variance grows with the level (multiplicative patterns), take the log first:

import numpy as np

log_series = np.log(series)
log_diff = log_series.diff().dropna()

This combination (log + difference) is extremely common in financial data analysis.

Seasonal differencing

For monthly data with yearly seasonality:

seasonal_diff = series.diff(12).dropna()  # remove yearly seasonal pattern

You might need both seasonal and non-seasonal differencing for data with trend and seasonality.

Determining how many differences

A practical workflow:

def find_differencing_order(series, max_d=2):
    """Find the minimum differencing order for stationarity."""
    for d in range(max_d + 1):
        s = series.copy()
        for _ in range(d):
            s = s.diff().dropna()
        
        adf_p = adfuller(s, autolag="AIC")[1]
        if adf_p < 0.05:
            return d
    
    return max_d  # fallback

The pmdarima library automates this with ndiffs() and nsdiffs():

from pmdarima.arima import ndiffs, nsdiffs

d = ndiffs(series, test="adf")
D = nsdiffs(series, m=12, test="ocsb")  # seasonal differencing order

Common misconception

People often difference until the ADF p-value is tiny, but over-differencing is a real problem. It inflates noise, destroys signal, and leads to worse forecasts. The goal is the minimum differencing order that achieves stationarity — not the one that makes the test statistic most extreme.

The one thing to remember: Stationarity testing is not a formality — it directly determines how you transform your data before modeling, and using ADF and KPSS together gives you a much clearer picture than either test alone.

pythontime-seriesstationaritystatistics

See Also