Autocorrelation Analysis in Python — Core Concepts
ACF vs PACF — two different views
ACF (Autocorrelation Function)
The ACF measures the total correlation between a series and its lagged version. At lag k, it answers: “How correlated is yₜ with yₜ₋ₖ?”
This includes both direct and indirect effects. If lag 1 is strong and lag 2 is also strong, the lag 2 correlation might just be because yₜ is correlated with yₜ₋₁, which is correlated with yₜ₋₂.
from statsmodels.graphics.tsaplots import plot_acf
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 4))
plot_acf(series.dropna(), lags=50, ax=ax)
plt.tight_layout()
PACF (Partial Autocorrelation Function)
The PACF strips out intermediate effects. At lag k, it answers: “How correlated is yₜ with yₜ₋ₖ after removing the effect of all lags in between?”
This isolates the direct relationship at each lag.
from statsmodels.graphics.tsaplots import plot_pacf
fig, ax = plt.subplots(figsize=(12, 4))
plot_pacf(series.dropna(), lags=50, ax=ax, method="ywm")
The method="ywm" (Yule-Walker modified) is more numerically stable than the default for some series.
Reading ACF/PACF plots
The shaded blue band represents the 95% confidence interval. Any bar that extends beyond this band is statistically significant.
Common patterns and what they mean
Gradual decay in ACF + sharp cutoff in PACF after lag p → AR(p) process. The series is best modeled with p autoregressive terms.
Sharp cutoff in ACF after lag q + gradual decay in PACF → MA(q) process. The series is best modeled with q moving average terms.
Both decay gradually → ARMA(p, q) process. Need both AR and MA terms.
Significant spikes at seasonal lags (7, 14, 21… or 12, 24, 36…) → Seasonal pattern present. Consider SARIMA or seasonal decomposition.
All lags significant and slowly decaying → Series is likely non-stationary. Difference first, then re-examine.
Computing autocorrelation numerically
from statsmodels.tsa.stattools import acf, pacf
# Compute ACF values with confidence intervals
acf_values, acf_confint = acf(series.dropna(), nlags=40, alpha=0.05)
# Compute PACF values
pacf_values, pacf_confint = pacf(series.dropna(), nlags=40, alpha=0.05)
# Find significant lags
import numpy as np
ci_width = (acf_confint[:, 1] - acf_confint[:, 0]) / 2
significant_lags = np.where(np.abs(acf_values) > ci_width)[0]
print(f"Significant ACF lags: {significant_lags}")
Using autocorrelation for model selection
A practical decision tree:
- Plot ACF of the raw series. If it decays very slowly, the series is non-stationary. Difference it.
- After differencing, plot ACF and PACF. Use the patterns above to identify candidate ARIMA orders.
- Check seasonal lags. Spikes at multiples of the seasonal period indicate seasonal terms are needed.
- Fit candidates and compare. Use AIC to choose among the models your ACF/PACF analysis suggested.
# Quick model identification helper
def identify_arima_order(series, max_lag=30):
"""Suggest ARIMA order from ACF/PACF analysis."""
acf_vals = acf(series.dropna(), nlags=max_lag)
pacf_vals = pacf(series.dropna(), nlags=max_lag)
# Approximate significance bound
n = len(series.dropna())
bound = 1.96 / np.sqrt(n)
sig_acf = [i for i in range(1, max_lag+1) if abs(acf_vals[i]) > bound]
sig_pacf = [i for i in range(1, max_lag+1) if abs(pacf_vals[i]) > bound]
return {
"significant_acf_lags": sig_acf[:5],
"significant_pacf_lags": sig_pacf[:5],
"suggested_p": max(sig_pacf[:3]) if sig_pacf else 0,
"suggested_q": max(sig_acf[:3]) if sig_acf else 0,
}
Cross-correlation for two series
When analyzing the relationship between two time series (e.g., advertising spend and sales), use cross-correlation:
from statsmodels.tsa.stattools import ccf
# ccf(x, y) gives correlation of x_t with y_{t+k}
cross_corr = ccf(ad_spend, sales, adjusted=False)
# Positive lag k: ad_spend leads sales by k periods
# Negative lag k: sales leads ad_spend by k periods
This reveals lead-lag relationships — does increased advertising precede higher sales, and by how many periods?
Common misconception
Many people assume that high autocorrelation at lag 1 means the series is “predictable.” It does not. A random walk has perfect lag-1 autocorrelation in levels but is completely unpredictable. What matters is autocorrelation in the stationary (differenced) series. That is where predictable patterns live.
The one thing to remember: ACF and PACF plots are the diagnostic X-ray of time series analysis — they reveal the memory structure, seasonal patterns, and stationarity issues in your data, and reading them correctly is the skill that separates guess-and-check modeling from informed model selection.
See Also
- Python Arima Forecasting How ARIMA models use patterns in past numbers to predict the future, explained like a bedtime story.
- Python Exponential Smoothing How exponential smoothing weighs recent events more heavily to predict what happens next, like trusting fresh memories more than old ones.
- Python Multivariate Time Series Why tracking multiple things at once gives you better predictions than tracking each one alone.
- Python Prophet Forecasting How Facebook's Prophet tool predicts the future by breaking data into easy-to-understand pieces.
- Python Seasonal Decomposition How Python breaks apart time data into trend, seasonal patterns, and leftover noise — like separating ingredients in a smoothie.