Markov Chains — Core Concepts
The Markov property
A Markov chain is a sequence of random states where the probability of each state depends only on the previous state. Formally: P(Xₙ₊₁ | Xₙ, Xₙ₋₁, …, X₀) = P(Xₙ₊₁ | Xₙ).
This “memoryless” property makes Markov chains tractable. Instead of tracking entire histories, you only need the current state and a set of transition probabilities.
Transition matrix
The transition probabilities are collected into a transition matrix T, where T[i][j] is the probability of moving from state i to state j. Each row sums to 1.
import numpy as np
# Weather: states = [Sunny, Rainy]
T = np.array([
[0.7, 0.3], # Sunny → 70% Sunny, 30% Rainy
[0.4, 0.6], # Rainy → 40% Sunny, 60% Rainy
])
# Simulate 10 days starting from Sunny (state 0)
state = 0
states = ['Sunny', 'Rainy']
for _ in range(10):
print(states[state], end=' → ')
state = np.random.choice(len(states), p=T[state])
print(states[state])
Multi-step transitions
To find the probability of being in each state after n steps, multiply the transition matrix by itself n times:
# Start: 100% Sunny
initial = np.array([1.0, 0.0])
# After 5 days
prob_5_days = initial @ np.linalg.matrix_power(T, 5)
print(f"After 5 days: Sunny={prob_5_days[0]:.3f}, Rainy={prob_5_days[1]:.3f}")
Stationary distribution
If you run a Markov chain long enough, the probability of being in each state converges to a stationary distribution π, where π = πT. This distribution does not depend on the starting state (for ergodic chains).
Finding it analytically:
# The stationary distribution is the left eigenvector with eigenvalue 1
eigenvalues, eigenvectors = np.linalg.eig(T.T)
# Find the eigenvector for eigenvalue ≈ 1
idx = np.argmin(np.abs(eigenvalues - 1))
stationary = np.real(eigenvectors[:, idx])
stationary = stationary / stationary.sum() # Normalize
print(f"Stationary: Sunny={stationary[0]:.3f}, Rainy={stationary[1]:.3f}")
# ≈ Sunny=0.571, Rainy=0.429
This means in the long run, it is sunny about 57% of the time and rainy 43%.
Types of states
Absorbing states
A state is absorbing if once entered, you never leave. Example: “Game Over” in a board game. The probability of eventually reaching an absorbing state can be computed from the fundamental matrix.
Transient vs recurrent
A state is recurrent if the chain is guaranteed to return to it. It is transient if there is a nonzero probability of never returning. In finite chains, all states in a communicating class are either all recurrent or all transient.
Periodic states
A state has period d if it can only be revisited at multiples of d steps. A chain where all states have period 1 is called aperiodic. Aperiodic + recurrent = ergodic, which guarantees convergence to the stationary distribution.
Building a text generator
A classic Markov chain application — generate text from a training corpus:
from collections import defaultdict
import random
def build_chain(text, order=2):
words = text.split()
chain = defaultdict(list)
for i in range(len(words) - order):
key = tuple(words[i:i + order])
chain[key].append(words[i + order])
return chain
def generate(chain, order=2, length=50):
key = random.choice(list(chain.keys()))
output = list(key)
for _ in range(length):
if key not in chain:
break
next_word = random.choice(chain[key])
output.append(next_word)
key = tuple(output[-order:])
return ' '.join(output)
Higher-order chains (order 3, 4) produce more coherent text but need more training data.
Common misconception
People sometimes think Markov chains cannot model long-range dependencies at all. While a first-order chain only looks one step back, an nth-order Markov chain looks n steps back. You can convert any nth-order chain into a first-order chain by expanding the state space (each “state” becomes the last n observations). The tradeoff: exponentially more states, but the same simple framework.
One thing to remember: The power of Markov chains comes from the memoryless property — by knowing only the current state and transition probabilities, you can predict long-term behavior, find steady states, and simulate complex systems.
See Also
- Python Bayesian Inference How updating your beliefs with new evidence works — and why it helps computers make smarter guesses.
- Python Convolution Operations The sliding-window trick that lets computers sharpen photos, recognize faces, and hear words in noisy audio.
- Python Fourier Transforms How breaking any sound, image, or signal into simple waves reveals hidden patterns invisible to the naked eye.
- Python Genetic Algorithms How computers borrow evolution's playbook — survival of the fittest, mutation, and reproduction — to solve problems too complicated for brute force.
- Python Linear Algebra Numpy Why solving puzzles with rows and columns of numbers is the secret engine behind search engines, video games, and AI.