Genetic Algorithms — Deep Dive

Schema theorem and building blocks

Holland’s Schema Theorem provides theoretical grounding for why GAs work. A schema is a pattern with wildcards — for a binary chromosome, 1**0*1 matches any individual with 1 in positions 0 and 5, and 0 in position 3.

The theorem states: short, low-order schemata with above-average fitness increase exponentially in the population. This means GAs implicitly test enormous numbers of building blocks in parallel, combining good short patterns into complete solutions.

In practice, this suggests that encodings where good partial solutions can be meaningfully combined (high epistasis locality) work best with GAs.

DEAP: the standard Python GA framework

DEAP (Distributed Evolutionary Algorithms in Python) provides a flexible framework for evolutionary computation:

from deap import base, creator, tools, algorithms
import numpy as np
import random

# Define fitness and individual types
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessMin)

# Rastrigin function
def rastrigin(individual):
    A = 10
    n = len(individual)
    return (A * n + sum(x**2 - A * np.cos(2 * np.pi * x) for x in individual),)

# Setup toolbox
toolbox = base.Toolbox()
N_DIMS = 20
toolbox.register("attr_float", random.uniform, -5.12, 5.12)
toolbox.register("individual", tools.initRepeat, creator.Individual,
                  toolbox.attr_float, n=N_DIMS)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("evaluate", rastrigin)
toolbox.register("mate", tools.cxBlend, alpha=0.5)
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=0.5, indpb=0.1)
toolbox.register("select", tools.selTournament, tournsize=3)

# Run
pop = toolbox.population(n=300)
hof = tools.HallOfFame(1)
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("min", np.min)
stats.register("avg", np.mean)

pop, logbook = algorithms.eaSimple(pop, toolbox, cxpb=0.7, mutpb=0.2,
                                     ngen=500, stats=stats, halloffame=hof,
                                     verbose=True)

print(f"Best fitness: {hof[0].fitness.values[0]:.6f}")

Multi-objective optimization (NSGA-II)

Real problems often have competing objectives (minimize cost AND maximize quality). NSGA-II maintains a population on the Pareto front — the set of solutions where no objective can be improved without worsening another:

creator.create("FitnessMulti", base.Fitness, weights=(-1.0, -1.0))
creator.create("IndividualMulti", list, fitness=creator.FitnessMulti)

def zdt1(individual):
    """ZDT1 benchmark: two competing objectives."""
    n = len(individual)
    f1 = individual[0]
    g = 1 + 9 * sum(individual[1:]) / (n - 1)
    f2 = g * (1 - np.sqrt(f1 / g))
    return f1, f2

toolbox.register("select", tools.selNSGA2)

# After evolution, extract Pareto front
pareto_front = tools.sortNondominated(pop, len(pop), first_front_only=True)[0]

NSGA-II uses non-dominated sorting and crowding distance to maintain a diverse spread of solutions across the front.

Adaptive operator selection

Instead of fixed crossover and mutation rates, adapt them based on which operators produce the best offspring:

class AdaptiveGA:
    def __init__(self, operators, window_size=50):
        self.operators = operators
        self.rewards = {op.__name__: [] for op in operators}
        self.window_size = window_size
    
    def select_operator(self):
        scores = {}
        for op in self.operators:
            recent = self.rewards[op.__name__][-self.window_size:]
            scores[op.__name__] = np.mean(recent) if recent else 1.0
        
        # Softmax selection
        total = sum(scores.values())
        probs = [scores[op.__name__] / total for op in self.operators]
        return np.random.choice(self.operators, p=probs)
    
    def record_reward(self, operator, parent_fitness, child_fitness):
        improvement = max(0, child_fitness - parent_fitness)
        self.rewards[operator.__name__].append(improvement)

Hybrid GA: memetic algorithms

Pure GAs are good at global exploration but slow at local refinement. Memetic algorithms combine GA with local search:

from scipy.optimize import minimize

def memetic_step(individual, fitness_fn, local_budget=50):
    """Apply local search to refine a GA individual."""
    result = minimize(
        lambda x: -fitness_fn(x),  # Minimize negative fitness
        individual,
        method='L-BFGS-B',
        options={'maxiter': local_budget}
    )
    return list(result.x)

# In the GA loop, apply local search to top individuals
for i in range(min(10, len(population))):
    population[i] = memetic_step(population[i], fitness_fn)

This strategy consistently outperforms pure GAs on continuous optimization benchmarks. The GA handles exploration; local search handles exploitation.

Island model parallelism

Run multiple GA populations (“islands”) independently, periodically migrating the best individuals between them:

from multiprocessing import Pool

def evolve_island(args):
    """Run GA on one island for a fixed number of generations."""
    population, toolbox, generations = args
    for _ in range(generations):
        offspring = algorithms.varAnd(population, toolbox, cxpb=0.7, mutpb=0.2)
        fits = map(toolbox.evaluate, offspring)
        for ind, fit in zip(offspring, fits):
            ind.fitness.values = fit
        population = toolbox.select(offspring, k=len(population))
    return population

def island_model(n_islands=4, island_size=100, total_gens=500, migration_interval=50):
    islands = [toolbox.population(n=island_size) for _ in range(n_islands)]
    
    for epoch in range(total_gens // migration_interval):
        # Evolve islands in parallel
        with Pool(n_islands) as pool:
            args = [(isl, toolbox, migration_interval) for isl in islands]
            islands = pool.map(evolve_island, args)
        
        # Migrate best individuals (ring topology)
        for i in range(n_islands):
            best = tools.selBest(islands[i], k=2)
            target = (i + 1) % n_islands
            worst_indices = np.argsort([ind.fitness.values[0] for ind in islands[target]])[:2]
            for j, idx in enumerate(worst_indices):
                islands[target][idx] = toolbox.clone(best[j])
    
    return islands

Island models improve diversity and enable linear speedup with more CPUs.

Constraint handling

Real-world optimization has constraints. Common approaches:

Penalty functions

Add a penalty term to the fitness for constraint violations:

def constrained_fitness(individual):
    f = objective(individual)
    penalty = 0
    if constraint_1(individual) > 0:
        penalty += 1000 * constraint_1(individual) ** 2
    if constraint_2(individual) > 0:
        penalty += 1000 * constraint_2(individual) ** 2
    return (f + penalty,)

Feasibility rules (Deb’s method)

  1. Feasible solutions always beat infeasible ones
  2. Among feasible solutions, better fitness wins
  3. Among infeasible solutions, smaller constraint violation wins
def constrained_compare(ind1, ind2):
    feas1 = is_feasible(ind1)
    feas2 = is_feasible(ind2)
    
    if feas1 and not feas2:
        return ind1
    if feas2 and not feas1:
        return ind2
    if feas1 and feas2:
        return ind1 if ind1.fitness > ind2.fitness else ind2
    # Both infeasible: prefer less violation
    return ind1 if violation(ind1) < violation(ind2) else ind2

GAs can evolve neural network architectures — selecting layer types, sizes, and connections:

# Chromosome encoding: list of layer specifications
# Each gene: (layer_type, units, activation, dropout)
LAYER_TYPES = ['dense', 'conv1d', 'lstm', 'skip']
ACTIVATIONS = ['relu', 'tanh', 'gelu', 'swish']

def decode_architecture(chromosome):
    """Convert chromosome to a Keras model specification."""
    layers = []
    for gene in chromosome:
        layer_type = LAYER_TYPES[gene[0] % len(LAYER_TYPES)]
        units = 2 ** (gene[1] % 8 + 3)  # 8 to 1024
        activation = ACTIVATIONS[gene[2] % len(ACTIVATIONS)]
        dropout = gene[3] / 100.0  # 0 to 1
        layers.append((layer_type, units, activation, dropout))
    return layers

def fitness_nas(chromosome):
    """Train and evaluate a network architecture."""
    spec = decode_architecture(chromosome)
    model = build_model(spec)
    model.fit(X_train, y_train, epochs=10, verbose=0)
    val_loss = model.evaluate(X_val, y_val, verbose=0)
    n_params = model.count_params()
    return (-val_loss, -np.log10(n_params))  # Multi-objective: accuracy vs size

Google’s AmoebaNet used evolutionary methods to discover architectures competitive with hand-designed and reinforcement-learning-designed networks.

Performance tuning checklist

  1. Population size — larger populations explore better but cost more evaluations. Start with 10× the number of dimensions.
  2. Selection pressure — tournament size 2–5 for most problems; larger for highly multimodal landscapes.
  3. Crossover rate — 0.6–0.9; higher for problems with good building blocks.
  4. Mutation rate — 1/n (where n is chromosome length) is a good default.
  5. Elitism — always preserve the best 1–5% to prevent regression.
  6. Termination — stop when fitness improvement plateaus for 50+ generations or when a target is reached.

One thing to remember: The power of genetic algorithms comes from population-level search — they explore many regions of the solution space simultaneously, and the real art is balancing exploration (diversity) against exploitation (refining good solutions).

pythonalgorithmsoptimizationevolutionary-computingdeap

See Also

  • Python Bayesian Inference How updating your beliefs with new evidence works — and why it helps computers make smarter guesses.
  • Python Convolution Operations The sliding-window trick that lets computers sharpen photos, recognize faces, and hear words in noisy audio.
  • Python Fourier Transforms How breaking any sound, image, or signal into simple waves reveals hidden patterns invisible to the naked eye.
  • Python Linear Algebra Numpy Why solving puzzles with rows and columns of numbers is the secret engine behind search engines, video games, and AI.
  • Python Markov Chains Why the next thing that happens often depends only on what is happening right now — and how that one rule generates text, predicts weather, and powers board games.