Graph Algorithms — Deep Dive

Technical framing

Graph theory is one of the richest areas of computer science. While BFS and DFS are the foundation, production systems regularly require more sophisticated algorithms: shortest paths with negative weights, minimum spanning trees, strongly connected components, and flow networks. Python’s standard library provides enough building blocks (heapq, collections.deque, defaultdict) to implement all of these efficiently.

Graph representations and their tradeoffs

Adjacency list with defaultdict

from collections import defaultdict

class Graph:
    def __init__(self, directed=False):
        self.adj = defaultdict(list)
        self.directed = directed
    
    def add_edge(self, u, v, weight=1):
        self.adj[u].append((v, weight))
        if not self.directed:
            self.adj[v].append((u, weight))
    
    def nodes(self):
        all_nodes = set(self.adj.keys())
        for neighbors in self.adj.values():
            for v, _ in neighbors:
                all_nodes.add(v)
        return all_nodes

Edge list (useful for Kruskal’s and Bellman-Ford)

edges = [(u, v, weight), ...]  # simple list of tuples

Adjacency matrix with NumPy (dense graphs)

import numpy as np

n = 100  # number of nodes
adj_matrix = np.full((n, n), np.inf)
np.fill_diagonal(adj_matrix, 0)
adj_matrix[0][1] = 5  # edge from 0 to 1 with weight 5

Space complexity: adjacency list is O(V + E), adjacency matrix is O(V²). For sparse graphs (E << V²), the list is vastly more efficient.

Advanced shortest path algorithms

Dijkstra with path reconstruction

import heapq

def dijkstra_with_path(graph, start, end):
    distances = {node: float('inf') for node in graph.nodes()}
    distances[start] = 0
    predecessors = {start: None}
    pq = [(0, start)]
    
    while pq:
        dist, current = heapq.heappop(pq)
        if current == end:
            break
        if dist > distances[current]:
            continue
        for neighbor, weight in graph.adj[current]:
            new_dist = dist + weight
            if new_dist < distances[neighbor]:
                distances[neighbor] = new_dist
                predecessors[neighbor] = current
                heapq.heappush(pq, (new_dist, neighbor))
    
    # Reconstruct path
    path = []
    node = end
    while node is not None:
        path.append(node)
        node = predecessors.get(node)
    path.reverse()
    
    if distances[end] == float('inf'):
        return None, float('inf')
    return path, distances[end]

Time complexity: O((V + E) log V) with a binary heap. For Fibonacci heaps (not in Python stdlib), it improves to O(V log V + E).

Bellman-Ford (handles negative weights)

def bellman_ford(nodes, edges, start):
    distances = {node: float('inf') for node in nodes}
    distances[start] = 0
    predecessors = {node: None for node in nodes}
    
    for _ in range(len(nodes) - 1):
        updated = False
        for u, v, weight in edges:
            if distances[u] + weight < distances[v]:
                distances[v] = distances[u] + weight
                predecessors[v] = u
                updated = True
        if not updated:
            break  # Early termination
    
    # Check for negative cycles
    for u, v, weight in edges:
        if distances[u] + weight < distances[v]:
            raise ValueError("Graph contains a negative-weight cycle")
    
    return distances, predecessors

Time complexity: O(V × E). Slower than Dijkstra but handles negative edge weights. The negative cycle detection makes it essential for financial arbitrage detection and network routing protocols (BGP uses a variant).

A* search (heuristic-guided)

import heapq

def a_star(graph, start, goal, heuristic):
    open_set = [(heuristic(start, goal), 0, start)]
    g_scores = {start: 0}
    came_from = {start: None}
    closed = set()
    
    while open_set:
        f_score, g_score, current = heapq.heappop(open_set)
        
        if current == goal:
            path = []
            while current is not None:
                path.append(current)
                current = came_from[current]
            return path[::-1], g_score
        
        if current in closed:
            continue
        closed.add(current)
        
        for neighbor, weight in graph.adj[current]:
            tentative_g = g_score + weight
            if tentative_g < g_scores.get(neighbor, float('inf')):
                g_scores[neighbor] = tentative_g
                f = tentative_g + heuristic(neighbor, goal)
                came_from[neighbor] = current
                heapq.heappush(open_set, (f, tentative_g, neighbor))
    
    return None, float('inf')

A* with an admissible heuristic (never overestimates) guarantees optimal paths while exploring fewer nodes than Dijkstra. Common heuristics: Manhattan distance for grids, Euclidean distance for geographic coordinates.

Minimum spanning trees

Kruskal’s algorithm with Union-Find

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # path compression
        return self.parent[x]
    
    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return False
        if self.rank[rx] < self.rank[ry]:
            rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]:
            self.rank[rx] += 1
        return True

def kruskal(num_nodes, edges):
    edges.sort(key=lambda e: e[2])  # sort by weight
    uf = UnionFind(num_nodes)
    mst = []
    total_weight = 0
    
    for u, v, weight in edges:
        if uf.union(u, v):
            mst.append((u, v, weight))
            total_weight += weight
            if len(mst) == num_nodes - 1:
                break
    
    return mst, total_weight

Time complexity: O(E log E) dominated by sorting. The Union-Find operations are nearly O(1) amortized with path compression and union by rank.

Prim’s algorithm

import heapq

def prim(graph, start):
    visited = set()
    mst = []
    total_weight = 0
    edges = [(weight, start, neighbor) for neighbor, weight in graph.adj[start]]
    heapq.heapify(edges)
    visited.add(start)
    
    while edges and len(visited) < len(graph.nodes()):
        weight, u, v = heapq.heappop(edges)
        if v in visited:
            continue
        visited.add(v)
        mst.append((u, v, weight))
        total_weight += weight
        for neighbor, w in graph.adj[v]:
            if neighbor not in visited:
                heapq.heappush(edges, (w, v, neighbor))
    
    return mst, total_weight

Prim’s is better for dense graphs; Kruskal’s is better for sparse graphs. Both produce the same MST cost.

Strongly connected components (Tarjan’s)

def tarjan_scc(graph):
    index_counter = [0]
    stack = []
    on_stack = set()
    index = {}
    lowlink = {}
    result = []
    
    def strongconnect(v):
        index[v] = lowlink[v] = index_counter[0]
        index_counter[0] += 1
        stack.append(v)
        on_stack.add(v)
        
        for w, _ in graph.adj.get(v, []):
            if w not in index:
                strongconnect(w)
                lowlink[v] = min(lowlink[v], lowlink[w])
            elif w in on_stack:
                lowlink[v] = min(lowlink[v], index[w])
        
        if lowlink[v] == index[v]:
            component = []
            while True:
                w = stack.pop()
                on_stack.remove(w)
                component.append(w)
                if w == v:
                    break
            result.append(component)
    
    for v in graph.nodes():
        if v not in index:
            strongconnect(v)
    
    return result

Tarjan’s runs in O(V + E) and finds all strongly connected components in a single DFS pass. Applications: compiler optimization (finding loops), social network analysis (finding tightly-knit communities), and 2-SAT satisfiability.

Python’s networkx for production use

For production graph analysis, the networkx library provides battle-tested implementations:

import networkx as nx

G = nx.DiGraph()
G.add_weighted_edges_from([("A", "B", 4), ("A", "C", 2), ("B", "D", 3), ("C", "D", 1)])

# Shortest path
path = nx.shortest_path(G, "A", "D", weight="weight")  # ['A', 'C', 'D']
length = nx.shortest_path_length(G, "A", "D", weight="weight")  # 3

# All SCCs
sccs = list(nx.strongly_connected_components(G))

# MST (for undirected)
UG = G.to_undirected()
mst = nx.minimum_spanning_tree(UG)

networkx is not the fastest (pure Python), but it handles correctness, edge cases, and has algorithms for nearly every graph problem. For large-scale graphs (millions of nodes), consider igraph (C backend) or graph-tool (C++/Boost backend).

Performance benchmarks

AlgorithmTime ComplexitySpaceBest Python Implementation
BFSO(V + E)O(V)collections.deque
DFSO(V + E)O(V)iterative with list as stack
DijkstraO((V+E) log V)O(V)heapq
Bellman-FordO(V × E)O(V)edge list iteration
A*O(E) best caseO(V)heapq with heuristic
Kruskal’s MSTO(E log E)O(V)Union-Find + sorted edges
Tarjan’s SCCO(V + E)O(V)recursive DFS

Tradeoffs and practical considerations

  • Sparse vs. dense: most real-world graphs are sparse. Use adjacency lists.
  • Directed vs. undirected: always be explicit about direction. Undirected graphs store each edge twice.
  • Weighted vs. unweighted: BFS is optimal for unweighted; Dijkstra for non-negative weights; Bellman-Ford for negative weights.
  • Recursion depth: Python’s default recursion limit (1000) limits recursive DFS on deep graphs. Use iterative implementations or sys.setrecursionlimit() with caution.
  • Memory for large graphs: for graphs with millions of nodes, adjacency lists with integer node IDs and array or NumPy-backed storage are essential.

One thing to remember: Graph algorithms are not exotic — they are the backbone of routing, scheduling, dependency resolution, and network analysis. Master BFS, DFS, Dijkstra, and topological sort, and you can solve most real-world graph problems.

pythonalgorithmsgraphs

See Also