Simulated Annealing — Core Concepts
How simulated annealing works
Simulated annealing (SA) is a probabilistic optimization algorithm that searches for a global optimum by accepting both improving and worsening moves, with the probability of accepting worse moves decreasing over time.
The algorithm:
- Start with an initial solution and a high temperature T.
- Generate a random neighbor of the current solution.
- If the neighbor is better, accept it.
- If the neighbor is worse, accept it with probability exp(-ΔE / T), where ΔE is how much worse it is.
- Reduce the temperature.
- Repeat until cooled down.
import numpy as np
def simulated_annealing(cost_fn, initial, neighbor_fn,
T_start=100.0, T_min=1e-8, alpha=0.995, max_iter=100000):
current = initial
current_cost = cost_fn(current)
best = current.copy()
best_cost = current_cost
T = T_start
for i in range(max_iter):
if T < T_min:
break
candidate = neighbor_fn(current)
candidate_cost = cost_fn(candidate)
delta = candidate_cost - current_cost
# Accept if better, or with probability exp(-delta/T) if worse
if delta < 0 or np.random.random() < np.exp(-delta / T):
current = candidate
current_cost = candidate_cost
if current_cost < best_cost:
best = current.copy()
best_cost = current_cost
T *= alpha # Cool down
return best, best_cost
The acceptance probability
The Metropolis criterion — exp(-ΔE / T) — is the heart of the algorithm:
| ΔE (how much worse) | High T (hot) | Low T (cold) |
|---|---|---|
| Small (ΔE = 1) | ~99% accept | ~5% accept |
| Medium (ΔE = 10) | ~90% accept | ~0.005% accept |
| Large (ΔE = 100) | ~37% accept | ~0% accept |
At high temperatures, almost anything is accepted — the algorithm explores freely. At low temperatures, only improvements pass — the algorithm refines.
Temperature schedule
The cooling schedule determines how fast the temperature drops. Three common schedules:
Geometric (exponential) cooling
T = T_start * alpha ** iteration # alpha typically 0.99 to 0.999
Most common. Simple and effective. Slower cooling (alpha closer to 1) gives better results but takes longer.
Linear cooling
T = T_start - (T_start - T_min) * iteration / max_iter
Predictable runtime. Less common because it cools too fast early and too slow late.
Logarithmic cooling
T = T_start / (1 + np.log(1 + iteration))
Theoretically guaranteed to converge to the global optimum (given infinite time). Too slow in practice.
Choosing the initial temperature
T_start should be high enough that ~80% of moves are accepted initially. A practical approach:
def estimate_initial_temperature(cost_fn, initial, neighbor_fn, n_samples=1000):
"""Find T where acceptance rate ≈ 80%."""
deltas = []
current = initial
for _ in range(n_samples):
neighbor = neighbor_fn(current)
delta = cost_fn(neighbor) - cost_fn(current)
if delta > 0:
deltas.append(delta)
avg_worsening = np.mean(deltas) if deltas else 1.0
# Solve: exp(-avg_delta / T) = 0.8
T_start = -avg_worsening / np.log(0.8)
return T_start
Neighbor generation
The neighbor function defines how the algorithm moves through the search space. Good neighbors should be:
- Close to the current solution (small perturbation)
- Reachable — every solution must be reachable from every other through a sequence of neighbor moves
- Fast to generate
For continuous optimization:
def continuous_neighbor(x, step_size=1.0):
return x + np.random.normal(0, step_size, size=x.shape)
For combinatorial problems (permutations):
def swap_neighbor(permutation):
"""Swap two random elements."""
new = permutation.copy()
i, j = np.random.choice(len(new), size=2, replace=False)
new[i], new[j] = new[j], new[i]
return new
Example: Traveling Salesman Problem
def tsp_cost(route, distances):
total = sum(distances[route[i]][route[i+1]] for i in range(len(route)-1))
return total + distances[route[-1]][route[0]] # Return to start
# 50 cities with random coordinates
n_cities = 50
coords = np.random.rand(n_cities, 2)
distances = np.linalg.norm(coords[:, None] - coords[None, :], axis=2)
initial_route = np.arange(n_cities)
np.random.shuffle(initial_route)
best_route, best_cost = simulated_annealing(
cost_fn=lambda r: tsp_cost(r, distances),
initial=initial_route,
neighbor_fn=swap_neighbor,
T_start=100, alpha=0.9995, max_iter=500000
)
When to use simulated annealing
SA excels when:
- The search space is large and discrete (scheduling, routing, assignment)
- You need one good solution, not a population (unlike genetic algorithms)
- Computing neighbors is cheap
- The problem has many local optima
SA is less ideal when:
- The landscape is smooth and convex (use gradient descent)
- You need multiple diverse solutions (use a population-based method)
- The problem has hard constraints that are difficult to encode as penalties
Common misconception
People often think simulated annealing is just random search. It is not — the Metropolis criterion provides a controlled balance between exploration and exploitation that provably converges (under logarithmic cooling). Random search has no such property. The temperature schedule is what transforms random exploration into intelligent optimization.
One thing to remember: Simulated annealing works because strategic imperfection — accepting worse solutions with decreasing probability — prevents getting trapped in local optima while still converging to excellent solutions.
See Also
- Python Bayesian Inference How updating your beliefs with new evidence works — and why it helps computers make smarter guesses.
- Python Convolution Operations The sliding-window trick that lets computers sharpen photos, recognize faces, and hear words in noisy audio.
- Python Fourier Transforms How breaking any sound, image, or signal into simple waves reveals hidden patterns invisible to the naked eye.
- Python Genetic Algorithms How computers borrow evolution's playbook — survival of the fittest, mutation, and reproduction — to solve problems too complicated for brute force.
- Python Linear Algebra Numpy Why solving puzzles with rows and columns of numbers is the secret engine behind search engines, video games, and AI.