Genetic Algorithms — Deep Dive
Schema theorem and building blocks
Holland’s Schema Theorem provides theoretical grounding for why GAs work. A schema is a pattern with wildcards — for a binary chromosome, 1**0*1 matches any individual with 1 in positions 0 and 5, and 0 in position 3.
The theorem states: short, low-order schemata with above-average fitness increase exponentially in the population. This means GAs implicitly test enormous numbers of building blocks in parallel, combining good short patterns into complete solutions.
In practice, this suggests that encodings where good partial solutions can be meaningfully combined (high epistasis locality) work best with GAs.
DEAP: the standard Python GA framework
DEAP (Distributed Evolutionary Algorithms in Python) provides a flexible framework for evolutionary computation:
from deap import base, creator, tools, algorithms
import numpy as np
import random
# Define fitness and individual types
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessMin)
# Rastrigin function
def rastrigin(individual):
A = 10
n = len(individual)
return (A * n + sum(x**2 - A * np.cos(2 * np.pi * x) for x in individual),)
# Setup toolbox
toolbox = base.Toolbox()
N_DIMS = 20
toolbox.register("attr_float", random.uniform, -5.12, 5.12)
toolbox.register("individual", tools.initRepeat, creator.Individual,
toolbox.attr_float, n=N_DIMS)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("evaluate", rastrigin)
toolbox.register("mate", tools.cxBlend, alpha=0.5)
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=0.5, indpb=0.1)
toolbox.register("select", tools.selTournament, tournsize=3)
# Run
pop = toolbox.population(n=300)
hof = tools.HallOfFame(1)
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("min", np.min)
stats.register("avg", np.mean)
pop, logbook = algorithms.eaSimple(pop, toolbox, cxpb=0.7, mutpb=0.2,
ngen=500, stats=stats, halloffame=hof,
verbose=True)
print(f"Best fitness: {hof[0].fitness.values[0]:.6f}")
Multi-objective optimization (NSGA-II)
Real problems often have competing objectives (minimize cost AND maximize quality). NSGA-II maintains a population on the Pareto front — the set of solutions where no objective can be improved without worsening another:
creator.create("FitnessMulti", base.Fitness, weights=(-1.0, -1.0))
creator.create("IndividualMulti", list, fitness=creator.FitnessMulti)
def zdt1(individual):
"""ZDT1 benchmark: two competing objectives."""
n = len(individual)
f1 = individual[0]
g = 1 + 9 * sum(individual[1:]) / (n - 1)
f2 = g * (1 - np.sqrt(f1 / g))
return f1, f2
toolbox.register("select", tools.selNSGA2)
# After evolution, extract Pareto front
pareto_front = tools.sortNondominated(pop, len(pop), first_front_only=True)[0]
NSGA-II uses non-dominated sorting and crowding distance to maintain a diverse spread of solutions across the front.
Adaptive operator selection
Instead of fixed crossover and mutation rates, adapt them based on which operators produce the best offspring:
class AdaptiveGA:
def __init__(self, operators, window_size=50):
self.operators = operators
self.rewards = {op.__name__: [] for op in operators}
self.window_size = window_size
def select_operator(self):
scores = {}
for op in self.operators:
recent = self.rewards[op.__name__][-self.window_size:]
scores[op.__name__] = np.mean(recent) if recent else 1.0
# Softmax selection
total = sum(scores.values())
probs = [scores[op.__name__] / total for op in self.operators]
return np.random.choice(self.operators, p=probs)
def record_reward(self, operator, parent_fitness, child_fitness):
improvement = max(0, child_fitness - parent_fitness)
self.rewards[operator.__name__].append(improvement)
Hybrid GA: memetic algorithms
Pure GAs are good at global exploration but slow at local refinement. Memetic algorithms combine GA with local search:
from scipy.optimize import minimize
def memetic_step(individual, fitness_fn, local_budget=50):
"""Apply local search to refine a GA individual."""
result = minimize(
lambda x: -fitness_fn(x), # Minimize negative fitness
individual,
method='L-BFGS-B',
options={'maxiter': local_budget}
)
return list(result.x)
# In the GA loop, apply local search to top individuals
for i in range(min(10, len(population))):
population[i] = memetic_step(population[i], fitness_fn)
This strategy consistently outperforms pure GAs on continuous optimization benchmarks. The GA handles exploration; local search handles exploitation.
Island model parallelism
Run multiple GA populations (“islands”) independently, periodically migrating the best individuals between them:
from multiprocessing import Pool
def evolve_island(args):
"""Run GA on one island for a fixed number of generations."""
population, toolbox, generations = args
for _ in range(generations):
offspring = algorithms.varAnd(population, toolbox, cxpb=0.7, mutpb=0.2)
fits = map(toolbox.evaluate, offspring)
for ind, fit in zip(offspring, fits):
ind.fitness.values = fit
population = toolbox.select(offspring, k=len(population))
return population
def island_model(n_islands=4, island_size=100, total_gens=500, migration_interval=50):
islands = [toolbox.population(n=island_size) for _ in range(n_islands)]
for epoch in range(total_gens // migration_interval):
# Evolve islands in parallel
with Pool(n_islands) as pool:
args = [(isl, toolbox, migration_interval) for isl in islands]
islands = pool.map(evolve_island, args)
# Migrate best individuals (ring topology)
for i in range(n_islands):
best = tools.selBest(islands[i], k=2)
target = (i + 1) % n_islands
worst_indices = np.argsort([ind.fitness.values[0] for ind in islands[target]])[:2]
for j, idx in enumerate(worst_indices):
islands[target][idx] = toolbox.clone(best[j])
return islands
Island models improve diversity and enable linear speedup with more CPUs.
Constraint handling
Real-world optimization has constraints. Common approaches:
Penalty functions
Add a penalty term to the fitness for constraint violations:
def constrained_fitness(individual):
f = objective(individual)
penalty = 0
if constraint_1(individual) > 0:
penalty += 1000 * constraint_1(individual) ** 2
if constraint_2(individual) > 0:
penalty += 1000 * constraint_2(individual) ** 2
return (f + penalty,)
Feasibility rules (Deb’s method)
- Feasible solutions always beat infeasible ones
- Among feasible solutions, better fitness wins
- Among infeasible solutions, smaller constraint violation wins
def constrained_compare(ind1, ind2):
feas1 = is_feasible(ind1)
feas2 = is_feasible(ind2)
if feas1 and not feas2:
return ind1
if feas2 and not feas1:
return ind2
if feas1 and feas2:
return ind1 if ind1.fitness > ind2.fitness else ind2
# Both infeasible: prefer less violation
return ind1 if violation(ind1) < violation(ind2) else ind2
Real-world application: neural architecture search
GAs can evolve neural network architectures — selecting layer types, sizes, and connections:
# Chromosome encoding: list of layer specifications
# Each gene: (layer_type, units, activation, dropout)
LAYER_TYPES = ['dense', 'conv1d', 'lstm', 'skip']
ACTIVATIONS = ['relu', 'tanh', 'gelu', 'swish']
def decode_architecture(chromosome):
"""Convert chromosome to a Keras model specification."""
layers = []
for gene in chromosome:
layer_type = LAYER_TYPES[gene[0] % len(LAYER_TYPES)]
units = 2 ** (gene[1] % 8 + 3) # 8 to 1024
activation = ACTIVATIONS[gene[2] % len(ACTIVATIONS)]
dropout = gene[3] / 100.0 # 0 to 1
layers.append((layer_type, units, activation, dropout))
return layers
def fitness_nas(chromosome):
"""Train and evaluate a network architecture."""
spec = decode_architecture(chromosome)
model = build_model(spec)
model.fit(X_train, y_train, epochs=10, verbose=0)
val_loss = model.evaluate(X_val, y_val, verbose=0)
n_params = model.count_params()
return (-val_loss, -np.log10(n_params)) # Multi-objective: accuracy vs size
Google’s AmoebaNet used evolutionary methods to discover architectures competitive with hand-designed and reinforcement-learning-designed networks.
Performance tuning checklist
- Population size — larger populations explore better but cost more evaluations. Start with 10× the number of dimensions.
- Selection pressure — tournament size 2–5 for most problems; larger for highly multimodal landscapes.
- Crossover rate — 0.6–0.9; higher for problems with good building blocks.
- Mutation rate — 1/n (where n is chromosome length) is a good default.
- Elitism — always preserve the best 1–5% to prevent regression.
- Termination — stop when fitness improvement plateaus for 50+ generations or when a target is reached.
One thing to remember: The power of genetic algorithms comes from population-level search — they explore many regions of the solution space simultaneously, and the real art is balancing exploration (diversity) against exploitation (refining good solutions).
See Also
- Python Bayesian Inference How updating your beliefs with new evidence works — and why it helps computers make smarter guesses.
- Python Convolution Operations The sliding-window trick that lets computers sharpen photos, recognize faces, and hear words in noisy audio.
- Python Fourier Transforms How breaking any sound, image, or signal into simple waves reveals hidden patterns invisible to the naked eye.
- Python Linear Algebra Numpy Why solving puzzles with rows and columns of numbers is the secret engine behind search engines, video games, and AI.
- Python Markov Chains Why the next thing that happens often depends only on what is happening right now — and how that one rule generates text, predicts weather, and powers board games.