Simulated Annealing — Core Concepts

Temperature schedules, acceptance probability, and neighbor generation — how to implement and tune simulated annealing in Python.

How simulated annealing works

Simulated annealing (SA) is a probabilistic optimization algorithm that searches for a global optimum by accepting both improving and worsening moves, with the probability of accepting worse moves decreasing over time.

The algorithm:

Start with an initial solution and a high temperature T.
Generate a random neighbor of the current solution.
If the neighbor is better, accept it.
If the neighbor is worse, accept it with probability exp(-ΔE / T), where ΔE is how much worse it is.
Reduce the temperature.
Repeat until cooled down.

import numpy as np

def simulated_annealing(cost_fn, initial, neighbor_fn,
                         T_start=100.0, T_min=1e-8, alpha=0.995, max_iter=100000):
    current = initial
    current_cost = cost_fn(current)
    best = current.copy()
    best_cost = current_cost
    T = T_start
    
    for i in range(max_iter):
        if T < T_min:
            break
        
        candidate = neighbor_fn(current)
        candidate_cost = cost_fn(candidate)
        delta = candidate_cost - current_cost
        
        # Accept if better, or with probability exp(-delta/T) if worse
        if delta < 0 or np.random.random() < np.exp(-delta / T):
            current = candidate
            current_cost = candidate_cost
            
            if current_cost < best_cost:
                best = current.copy()
                best_cost = current_cost
        
        T *= alpha  # Cool down
    
    return best, best_cost

The acceptance probability

The Metropolis criterion — exp(-ΔE / T) — is the heart of the algorithm:

ΔE (how much worse)	High T (hot)	Low T (cold)
Small (ΔE = 1)	~99% accept	~5% accept
Medium (ΔE = 10)	~90% accept	~0.005% accept
Large (ΔE = 100)	~37% accept	~0% accept

At high temperatures, almost anything is accepted — the algorithm explores freely. At low temperatures, only improvements pass — the algorithm refines.

Temperature schedule

The cooling schedule determines how fast the temperature drops. Three common schedules:

Geometric (exponential) cooling

T = T_start * alpha ** iteration  # alpha typically 0.99 to 0.999

Most common. Simple and effective. Slower cooling (alpha closer to 1) gives better results but takes longer.

Linear cooling

T = T_start - (T_start - T_min) * iteration / max_iter

Predictable runtime. Less common because it cools too fast early and too slow late.

Logarithmic cooling

T = T_start / (1 + np.log(1 + iteration))

Theoretically guaranteed to converge to the global optimum (given infinite time). Too slow in practice.

Choosing the initial temperature

T_start should be high enough that ~80% of moves are accepted initially. A practical approach:

def estimate_initial_temperature(cost_fn, initial, neighbor_fn, n_samples=1000):
    """Find T where acceptance rate ≈ 80%."""
    deltas = []
    current = initial
    for _ in range(n_samples):
        neighbor = neighbor_fn(current)
        delta = cost_fn(neighbor) - cost_fn(current)
        if delta > 0:
            deltas.append(delta)
    
    avg_worsening = np.mean(deltas) if deltas else 1.0
    # Solve: exp(-avg_delta / T) = 0.8
    T_start = -avg_worsening / np.log(0.8)
    return T_start

Neighbor generation

The neighbor function defines how the algorithm moves through the search space. Good neighbors should be:

Close to the current solution (small perturbation)
Reachable — every solution must be reachable from every other through a sequence of neighbor moves
Fast to generate

For continuous optimization:

def continuous_neighbor(x, step_size=1.0):
    return x + np.random.normal(0, step_size, size=x.shape)

For combinatorial problems (permutations):

def swap_neighbor(permutation):
    """Swap two random elements."""
    new = permutation.copy()
    i, j = np.random.choice(len(new), size=2, replace=False)
    new[i], new[j] = new[j], new[i]
    return new

Example: Traveling Salesman Problem

def tsp_cost(route, distances):
    total = sum(distances[route[i]][route[i+1]] for i in range(len(route)-1))
    return total + distances[route[-1]][route[0]]  # Return to start

# 50 cities with random coordinates
n_cities = 50
coords = np.random.rand(n_cities, 2)
distances = np.linalg.norm(coords[:, None] - coords[None, :], axis=2)

initial_route = np.arange(n_cities)
np.random.shuffle(initial_route)

best_route, best_cost = simulated_annealing(
    cost_fn=lambda r: tsp_cost(r, distances),
    initial=initial_route,
    neighbor_fn=swap_neighbor,
    T_start=100, alpha=0.9995, max_iter=500000
)

When to use simulated annealing

SA excels when:

The search space is large and discrete (scheduling, routing, assignment)
You need one good solution, not a population (unlike genetic algorithms)
Computing neighbors is cheap
The problem has many local optima

SA is less ideal when:

The landscape is smooth and convex (use gradient descent)
You need multiple diverse solutions (use a population-based method)
The problem has hard constraints that are difficult to encode as penalties

Common misconception

People often think simulated annealing is just random search. It is not — the Metropolis criterion provides a controlled balance between exploration and exploitation that provably converges (under logarithmic cooling). Random search has no such property. The temperature schedule is what transforms random exploration into intelligent optimization.

One thing to remember: Simulated annealing works because strategic imperfection — accepting worse solutions with decreasing probability — prevents getting trapped in local optima while still converging to excellent solutions.

pythonalgorithmsoptimization