Bayesian Inference — Core Concepts

Bayes' theorem, priors, likelihoods, and posteriors explained with practical Python examples using PyMC and conjugate distributions.

Bayes’ theorem

At its core, Bayesian inference uses a single formula:

P(hypothesis | data) = P(data | hypothesis) × P(hypothesis) / P(data)

In plain language:

P(hypothesis | data) — the posterior: what you believe after seeing the data
P(data | hypothesis) — the likelihood: how probable the data is if the hypothesis is true
P(hypothesis) — the prior: what you believed before seeing the data
P(data) — the evidence: how probable the data is under all hypotheses (a normalizing constant)

A concrete example

Suppose 1% of emails are spam. A certain word (“winner”) appears in 80% of spam emails and 5% of legitimate emails. An email contains “winner” — what is the probability it is spam?

prior_spam = 0.01
likelihood_word_given_spam = 0.80
likelihood_word_given_ham = 0.05

# P(data) = P(word|spam)*P(spam) + P(word|ham)*P(ham)
evidence = likelihood_word_given_spam * prior_spam + likelihood_word_given_ham * (1 - prior_spam)

posterior_spam = (likelihood_word_given_spam * prior_spam) / evidence
# posterior_spam ≈ 0.139 — about 14% chance it's spam

Even though “winner” appears in 80% of spam, the low base rate (1% spam) keeps the posterior moderate. This is why Bayesian thinking protects against overreacting to a single piece of evidence.

Priors: encoding what you already know

The prior represents your belief before data arrives. Common choices:

Prior type	When to use	Example
Uniform	No information	Any value equally likely
Normal	Centered around an expected value	Temperature tomorrow
Beta	Probability parameters (0 to 1)	Click-through rates
Exponential	Positive values, expects small	Wait times

Choosing priors is not cheating — it is encoding domain knowledge. A doctor diagnosing a rare disease should start with a low prior probability.

Conjugate priors

When the prior and likelihood belong to certain mathematical families, the posterior has the same form as the prior — making the math tractable. The most useful conjugate pairs:

Beta prior + Binomial likelihood → Beta posterior (coin flips, conversion rates)
Normal prior + Normal likelihood → Normal posterior (measurements)
Gamma prior + Poisson likelihood → Gamma posterior (event counts)

from scipy import stats

# Beta-Binomial: estimating a coin's fairness
# Prior: Beta(2, 2) — mild belief in fairness
# Data: 7 heads in 10 flips
alpha_prior, beta_prior = 2, 2
heads, tails = 7, 3

alpha_posterior = alpha_prior + heads  # 9
beta_posterior = beta_prior + tails    # 5
posterior = stats.beta(alpha_posterior, beta_posterior)

print(f"Posterior mean: {posterior.mean():.3f}")  # ~0.643
print(f"95% credible interval: {posterior.ppf([0.025, 0.975])}")

Credible intervals vs confidence intervals

A Bayesian credible interval says “there is a 95% probability the parameter lies in this range.” A frequentist confidence interval says something more convoluted about repeated experiments. Credible intervals answer the question people actually want answered.

Markov Chain Monte Carlo (MCMC)

For complex models where conjugate priors do not apply, MCMC algorithms sample from the posterior numerically. PyMC is the standard Python tool:

import pymc as pm
import numpy as np

data = np.random.normal(loc=5.0, scale=2.0, size=100)

with pm.Model() as model:
    # Priors
    mu = pm.Normal('mu', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=5)
    
    # Likelihood
    obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    
    # Sample from posterior
    trace = pm.sample(2000, tune=1000, cores=2)

# Posterior summary
print(pm.summary(trace))

PyMC uses the NUTS (No U-Turn Sampler) algorithm, which adapts its step size automatically and handles most models without manual tuning.

When to use Bayesian inference

Bayesian inference shines when:

You have prior knowledge worth incorporating
You need uncertainty quantification (not just point estimates)
Your data is limited and you want to avoid overfitting
You want to update incrementally as new data arrives
You need to compare models (via Bayes factors)

Common misconception

People often think Bayesian methods are “subjective” because they use priors. But priors are explicit and transparent — anyone can see and critique them. Frequentist methods hide their assumptions in the choice of model, test statistic, and significance threshold. Both approaches involve judgment calls; Bayesian inference just forces you to state yours upfront.

One thing to remember: Bayesian inference is not about getting “the right answer” — it is about honestly tracking how certain you are, and updating that certainty as evidence accumulates.

pythonmathstatisticsprobability