Bayesian Inference — Core Concepts

Bayes’ theorem

At its core, Bayesian inference uses a single formula:

P(hypothesis | data) = P(data | hypothesis) × P(hypothesis) / P(data)

In plain language:

  • P(hypothesis | data) — the posterior: what you believe after seeing the data
  • P(data | hypothesis) — the likelihood: how probable the data is if the hypothesis is true
  • P(hypothesis) — the prior: what you believed before seeing the data
  • P(data) — the evidence: how probable the data is under all hypotheses (a normalizing constant)

A concrete example

Suppose 1% of emails are spam. A certain word (“winner”) appears in 80% of spam emails and 5% of legitimate emails. An email contains “winner” — what is the probability it is spam?

prior_spam = 0.01
likelihood_word_given_spam = 0.80
likelihood_word_given_ham = 0.05

# P(data) = P(word|spam)*P(spam) + P(word|ham)*P(ham)
evidence = likelihood_word_given_spam * prior_spam + likelihood_word_given_ham * (1 - prior_spam)

posterior_spam = (likelihood_word_given_spam * prior_spam) / evidence
# posterior_spam ≈ 0.139 — about 14% chance it's spam

Even though “winner” appears in 80% of spam, the low base rate (1% spam) keeps the posterior moderate. This is why Bayesian thinking protects against overreacting to a single piece of evidence.

Priors: encoding what you already know

The prior represents your belief before data arrives. Common choices:

Prior typeWhen to useExample
UniformNo informationAny value equally likely
NormalCentered around an expected valueTemperature tomorrow
BetaProbability parameters (0 to 1)Click-through rates
ExponentialPositive values, expects smallWait times

Choosing priors is not cheating — it is encoding domain knowledge. A doctor diagnosing a rare disease should start with a low prior probability.

Conjugate priors

When the prior and likelihood belong to certain mathematical families, the posterior has the same form as the prior — making the math tractable. The most useful conjugate pairs:

  • Beta prior + Binomial likelihood → Beta posterior (coin flips, conversion rates)
  • Normal prior + Normal likelihood → Normal posterior (measurements)
  • Gamma prior + Poisson likelihood → Gamma posterior (event counts)
from scipy import stats

# Beta-Binomial: estimating a coin's fairness
# Prior: Beta(2, 2) — mild belief in fairness
# Data: 7 heads in 10 flips
alpha_prior, beta_prior = 2, 2
heads, tails = 7, 3

alpha_posterior = alpha_prior + heads  # 9
beta_posterior = beta_prior + tails    # 5
posterior = stats.beta(alpha_posterior, beta_posterior)

print(f"Posterior mean: {posterior.mean():.3f}")  # ~0.643
print(f"95% credible interval: {posterior.ppf([0.025, 0.975])}")

Credible intervals vs confidence intervals

A Bayesian credible interval says “there is a 95% probability the parameter lies in this range.” A frequentist confidence interval says something more convoluted about repeated experiments. Credible intervals answer the question people actually want answered.

Markov Chain Monte Carlo (MCMC)

For complex models where conjugate priors do not apply, MCMC algorithms sample from the posterior numerically. PyMC is the standard Python tool:

import pymc as pm
import numpy as np

data = np.random.normal(loc=5.0, scale=2.0, size=100)

with pm.Model() as model:
    # Priors
    mu = pm.Normal('mu', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=5)
    
    # Likelihood
    obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    
    # Sample from posterior
    trace = pm.sample(2000, tune=1000, cores=2)

# Posterior summary
print(pm.summary(trace))

PyMC uses the NUTS (No U-Turn Sampler) algorithm, which adapts its step size automatically and handles most models without manual tuning.

When to use Bayesian inference

Bayesian inference shines when:

  • You have prior knowledge worth incorporating
  • You need uncertainty quantification (not just point estimates)
  • Your data is limited and you want to avoid overfitting
  • You want to update incrementally as new data arrives
  • You need to compare models (via Bayes factors)

Common misconception

People often think Bayesian methods are “subjective” because they use priors. But priors are explicit and transparent — anyone can see and critique them. Frequentist methods hide their assumptions in the choice of model, test statistic, and significance threshold. Both approaches involve judgment calls; Bayesian inference just forces you to state yours upfront.

One thing to remember: Bayesian inference is not about getting “the right answer” — it is about honestly tracking how certain you are, and updating that certainty as evidence accumulates.

pythonmathstatisticsprobability

See Also

  • Python Convolution Operations The sliding-window trick that lets computers sharpen photos, recognize faces, and hear words in noisy audio.
  • Python Fourier Transforms How breaking any sound, image, or signal into simple waves reveals hidden patterns invisible to the naked eye.
  • Python Genetic Algorithms How computers borrow evolution's playbook — survival of the fittest, mutation, and reproduction — to solve problems too complicated for brute force.
  • Python Linear Algebra Numpy Why solving puzzles with rows and columns of numbers is the secret engine behind search engines, video games, and AI.
  • Python Markov Chains Why the next thing that happens often depends only on what is happening right now — and how that one rule generates text, predicts weather, and powers board games.