Markov Chains — Deep Dive

Eigenvalue analysis of transition matrices

The spectral properties of a transition matrix reveal the chain’s dynamics. For a transition matrix T:

  • The largest eigenvalue is always 1 (Perron-Frobenius theorem for stochastic matrices).
  • The second-largest eigenvalue λ₂ determines the mixing time — how quickly the chain converges to the stationary distribution. Mixing time ≈ 1/(1 - |λ₂|).
import numpy as np

T = np.array([
    [0.1, 0.6, 0.3],
    [0.4, 0.2, 0.4],
    [0.3, 0.3, 0.4],
])

eigenvalues = np.linalg.eigvals(T)
sorted_eigs = np.sort(np.abs(eigenvalues))[::-1]
spectral_gap = 1 - sorted_eigs[1]
mixing_time_approx = 1 / spectral_gap
print(f"Spectral gap: {spectral_gap:.4f}")
print(f"Approximate mixing time: {mixing_time_approx:.1f} steps")

A small spectral gap means slow mixing — the chain gets “stuck” in regions of state space. This is critical for MCMC algorithm design.

Hidden Markov Models (HMMs)

In an HMM, the Markov chain’s states are not directly observed. Instead, each hidden state emits an observation according to an emission probability distribution. The challenge: infer the hidden state sequence from observations.

Three fundamental problems:

  1. Evaluation — P(observations | model): Forward algorithm, O(NT²)
  2. Decoding — most likely state sequence: Viterbi algorithm, O(NT²)
  3. Learning — estimate model parameters: Baum-Welch (EM), iterative
from hmmlearn import hmm

# Stock market: hidden states = [Bull, Bear, Stagnant]
model = hmm.GaussianHMM(n_components=3, covariance_type='full', n_iter=100)

# returns is a (T, 1) array of daily returns
model.fit(returns.reshape(-1, 1))

# Decode most likely regime sequence
hidden_states = model.predict(returns.reshape(-1, 1))

# Transition matrix between regimes
print(model.transmat_)

# Emission parameters per regime
for i in range(3):
    print(f"State {i}: mean={model.means_[i][0]:.4f}, "
          f"var={model.covars_[i][0][0]:.6f}")

HMMs power speech recognition (phoneme sequences), bioinformatics (gene finding), and financial regime detection.

PageRank as a Markov chain

Google’s PageRank models a “random surfer” who follows links with probability α and jumps to a random page with probability (1-α):

def pagerank(adjacency_matrix, alpha=0.85, tol=1e-8, max_iter=100):
    """Compute PageRank via power iteration."""
    n = adjacency_matrix.shape[0]
    
    # Normalize to column-stochastic
    out_degree = adjacency_matrix.sum(axis=1)
    out_degree[out_degree == 0] = 1  # Dangling nodes
    T = adjacency_matrix / out_degree[:, np.newaxis]
    
    # Google matrix: M = α*T + (1-α)/n * ones
    rank = np.ones(n) / n
    
    for _ in range(max_iter):
        new_rank = alpha * (T.T @ rank) + (1 - alpha) / n
        if np.linalg.norm(new_rank - rank) < tol:
            break
        rank = new_rank
    
    return rank / rank.sum()

The damping factor α prevents the chain from getting trapped in cycles. The resulting stationary distribution is the PageRank vector.

Continuous-time Markov chains (CTMCs)

Discrete-time chains transition at fixed time steps. CTMCs transition at random times governed by exponential distributions. The transition rates are stored in a rate matrix Q where:

  • Off-diagonal Q[i][j] ≥ 0: rate of transitioning from i to j
  • Diagonal Q[i][i] = -Σⱼ≠ᵢ Q[i][j]: negative sum of outgoing rates
from scipy.linalg import expm

# Rate matrix: chemical reaction system
Q = np.array([
    [-0.5,  0.3,  0.2],
    [ 0.1, -0.4,  0.3],
    [ 0.2,  0.2, -0.4],
])

# Probability of being in each state at time t
initial = np.array([1.0, 0.0, 0.0])  # Start in state 0
t = 10.0
P_t = initial @ expm(Q * t)
print(f"State probabilities at t={t}: {P_t}")

CTMCs model queuing systems (M/M/1 queues), chemical kinetics, and reliability engineering.

Absorbing chain analysis

For chains with absorbing states, the fundamental matrix N = (I - Q)⁻¹ (where Q is the transient-to-transient submatrix) gives:

  • N[i][j] = expected number of times the chain visits transient state j starting from transient state i
  • Row sums of N = expected number of steps to absorption from each transient state
  • B = NR gives absorption probabilities (R is the transient-to-absorbing submatrix)
def absorbing_analysis(T, absorbing_states):
    """Analyze an absorbing Markov chain."""
    n = T.shape[0]
    transient = [i for i in range(n) if i not in absorbing_states]
    
    # Extract Q (transient → transient) and R (transient → absorbing)
    Q = T[np.ix_(transient, transient)]
    R = T[np.ix_(transient, absorbing_states)]
    
    # Fundamental matrix
    N = np.linalg.inv(np.eye(len(transient)) - Q)
    
    # Expected steps to absorption
    expected_steps = N.sum(axis=1)
    
    # Absorption probabilities
    B = N @ R
    
    return N, expected_steps, B

# Example: Gambler's ruin with 5 states (0 and 4 are absorbing)
T_gambler = np.array([
    [1.0, 0.0, 0.0, 0.0, 0.0],  # State 0: absorbing (broke)
    [0.5, 0.0, 0.5, 0.0, 0.0],
    [0.0, 0.5, 0.0, 0.5, 0.0],
    [0.0, 0.0, 0.5, 0.0, 0.5],
    [0.0, 0.0, 0.0, 0.0, 1.0],  # State 4: absorbing (goal)
])

N, steps, B = absorbing_analysis(T_gambler, [0, 4])
print(f"Expected steps from state 2: {steps[1]:.1f}")
print(f"P(win | start at 2): {B[1, 1]:.3f}")

MCMC: Markov chains for sampling

The Metropolis-Hastings algorithm constructs a Markov chain whose stationary distribution equals a target distribution you want to sample from:

def metropolis_hastings(target_log_prob, initial, n_samples, proposal_std=1.0):
    """Generic Metropolis-Hastings sampler."""
    samples = np.zeros(n_samples)
    current = initial
    current_log_prob = target_log_prob(current)
    accepted = 0
    
    for i in range(n_samples):
        # Propose
        proposal = current + np.random.normal(0, proposal_std)
        proposal_log_prob = target_log_prob(proposal)
        
        # Accept/reject
        log_alpha = proposal_log_prob - current_log_prob
        if np.log(np.random.uniform()) < log_alpha:
            current = proposal
            current_log_prob = proposal_log_prob
            accepted += 1
        
        samples[i] = current
    
    print(f"Acceptance rate: {accepted/n_samples:.2%}")
    return samples

# Sample from a mixture of Gaussians
def target(x):
    from scipy.stats import norm
    return np.log(0.3 * norm.pdf(x, -2, 0.5) + 0.7 * norm.pdf(x, 3, 1))

samples = metropolis_hastings(target, initial=0.0, n_samples=50000, proposal_std=2.0)

The proposal standard deviation controls the exploration-exploitation tradeoff. Too small: slow exploration (high acceptance, correlated samples). Too large: frequent rejection (low acceptance, stuck).

Markov chain mixing and coupling

Two theoretical tools for analyzing convergence:

Total variation distance

The total variation distance between the chain’s distribution at time t and the stationary distribution measures how far from equilibrium the chain is. It decreases geometrically with rate |λ₂|.

Coupling

A coupling argument constructs two copies of the chain (starting from different states) and shows they merge. The coupling time upper-bounds the mixing time. This is the standard technique for proving rapid mixing in theoretical computer science.

Scaling to large state spaces

For chains with millions of states (web graphs, protein conformations):

from scipy.sparse import csr_matrix
from scipy.sparse.linalg import eigs

# Sparse transition matrix
T_sparse = csr_matrix(T_large)

# Find stationary distribution via sparse eigensolver
eigenvalues, eigenvectors = eigs(T_sparse.T, k=1, which='LM')
stationary = np.real(eigenvectors[:, 0]).flatten()
stationary = np.abs(stationary) / np.abs(stationary).sum()

Sparse storage and iterative eigensolvers handle matrices that would not fit in memory as dense arrays.

One thing to remember: Markov chains are the computational bridge between probability theory and practical algorithms — from MCMC sampling to PageRank to HMMs, the memoryless property turns intractable problems into matrix operations.

pythonmathprobabilityalgorithmsmachine-learning

See Also

  • Python Bayesian Inference How updating your beliefs with new evidence works — and why it helps computers make smarter guesses.
  • Python Convolution Operations The sliding-window trick that lets computers sharpen photos, recognize faces, and hear words in noisy audio.
  • Python Fourier Transforms How breaking any sound, image, or signal into simple waves reveals hidden patterns invisible to the naked eye.
  • Python Genetic Algorithms How computers borrow evolution's playbook — survival of the fittest, mutation, and reproduction — to solve problems too complicated for brute force.
  • Python Linear Algebra Numpy Why solving puzzles with rows and columns of numbers is the secret engine behind search engines, video games, and AI.