Graph Algorithms — Deep Dive
Technical framing
Graph theory is one of the richest areas of computer science. While BFS and DFS are the foundation, production systems regularly require more sophisticated algorithms: shortest paths with negative weights, minimum spanning trees, strongly connected components, and flow networks. Python’s standard library provides enough building blocks (heapq, collections.deque, defaultdict) to implement all of these efficiently.
Graph representations and their tradeoffs
Adjacency list with defaultdict
from collections import defaultdict
class Graph:
def __init__(self, directed=False):
self.adj = defaultdict(list)
self.directed = directed
def add_edge(self, u, v, weight=1):
self.adj[u].append((v, weight))
if not self.directed:
self.adj[v].append((u, weight))
def nodes(self):
all_nodes = set(self.adj.keys())
for neighbors in self.adj.values():
for v, _ in neighbors:
all_nodes.add(v)
return all_nodes
Edge list (useful for Kruskal’s and Bellman-Ford)
edges = [(u, v, weight), ...] # simple list of tuples
Adjacency matrix with NumPy (dense graphs)
import numpy as np
n = 100 # number of nodes
adj_matrix = np.full((n, n), np.inf)
np.fill_diagonal(adj_matrix, 0)
adj_matrix[0][1] = 5 # edge from 0 to 1 with weight 5
Space complexity: adjacency list is O(V + E), adjacency matrix is O(V²). For sparse graphs (E << V²), the list is vastly more efficient.
Advanced shortest path algorithms
Dijkstra with path reconstruction
import heapq
def dijkstra_with_path(graph, start, end):
distances = {node: float('inf') for node in graph.nodes()}
distances[start] = 0
predecessors = {start: None}
pq = [(0, start)]
while pq:
dist, current = heapq.heappop(pq)
if current == end:
break
if dist > distances[current]:
continue
for neighbor, weight in graph.adj[current]:
new_dist = dist + weight
if new_dist < distances[neighbor]:
distances[neighbor] = new_dist
predecessors[neighbor] = current
heapq.heappush(pq, (new_dist, neighbor))
# Reconstruct path
path = []
node = end
while node is not None:
path.append(node)
node = predecessors.get(node)
path.reverse()
if distances[end] == float('inf'):
return None, float('inf')
return path, distances[end]
Time complexity: O((V + E) log V) with a binary heap. For Fibonacci heaps (not in Python stdlib), it improves to O(V log V + E).
Bellman-Ford (handles negative weights)
def bellman_ford(nodes, edges, start):
distances = {node: float('inf') for node in nodes}
distances[start] = 0
predecessors = {node: None for node in nodes}
for _ in range(len(nodes) - 1):
updated = False
for u, v, weight in edges:
if distances[u] + weight < distances[v]:
distances[v] = distances[u] + weight
predecessors[v] = u
updated = True
if not updated:
break # Early termination
# Check for negative cycles
for u, v, weight in edges:
if distances[u] + weight < distances[v]:
raise ValueError("Graph contains a negative-weight cycle")
return distances, predecessors
Time complexity: O(V × E). Slower than Dijkstra but handles negative edge weights. The negative cycle detection makes it essential for financial arbitrage detection and network routing protocols (BGP uses a variant).
A* search (heuristic-guided)
import heapq
def a_star(graph, start, goal, heuristic):
open_set = [(heuristic(start, goal), 0, start)]
g_scores = {start: 0}
came_from = {start: None}
closed = set()
while open_set:
f_score, g_score, current = heapq.heappop(open_set)
if current == goal:
path = []
while current is not None:
path.append(current)
current = came_from[current]
return path[::-1], g_score
if current in closed:
continue
closed.add(current)
for neighbor, weight in graph.adj[current]:
tentative_g = g_score + weight
if tentative_g < g_scores.get(neighbor, float('inf')):
g_scores[neighbor] = tentative_g
f = tentative_g + heuristic(neighbor, goal)
came_from[neighbor] = current
heapq.heappush(open_set, (f, tentative_g, neighbor))
return None, float('inf')
A* with an admissible heuristic (never overestimates) guarantees optimal paths while exploring fewer nodes than Dijkstra. Common heuristics: Manhattan distance for grids, Euclidean distance for geographic coordinates.
Minimum spanning trees
Kruskal’s algorithm with Union-Find
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # path compression
return self.parent[x]
def union(self, x, y):
rx, ry = self.find(x), self.find(y)
if rx == ry:
return False
if self.rank[rx] < self.rank[ry]:
rx, ry = ry, rx
self.parent[ry] = rx
if self.rank[rx] == self.rank[ry]:
self.rank[rx] += 1
return True
def kruskal(num_nodes, edges):
edges.sort(key=lambda e: e[2]) # sort by weight
uf = UnionFind(num_nodes)
mst = []
total_weight = 0
for u, v, weight in edges:
if uf.union(u, v):
mst.append((u, v, weight))
total_weight += weight
if len(mst) == num_nodes - 1:
break
return mst, total_weight
Time complexity: O(E log E) dominated by sorting. The Union-Find operations are nearly O(1) amortized with path compression and union by rank.
Prim’s algorithm
import heapq
def prim(graph, start):
visited = set()
mst = []
total_weight = 0
edges = [(weight, start, neighbor) for neighbor, weight in graph.adj[start]]
heapq.heapify(edges)
visited.add(start)
while edges and len(visited) < len(graph.nodes()):
weight, u, v = heapq.heappop(edges)
if v in visited:
continue
visited.add(v)
mst.append((u, v, weight))
total_weight += weight
for neighbor, w in graph.adj[v]:
if neighbor not in visited:
heapq.heappush(edges, (w, v, neighbor))
return mst, total_weight
Prim’s is better for dense graphs; Kruskal’s is better for sparse graphs. Both produce the same MST cost.
Strongly connected components (Tarjan’s)
def tarjan_scc(graph):
index_counter = [0]
stack = []
on_stack = set()
index = {}
lowlink = {}
result = []
def strongconnect(v):
index[v] = lowlink[v] = index_counter[0]
index_counter[0] += 1
stack.append(v)
on_stack.add(v)
for w, _ in graph.adj.get(v, []):
if w not in index:
strongconnect(w)
lowlink[v] = min(lowlink[v], lowlink[w])
elif w in on_stack:
lowlink[v] = min(lowlink[v], index[w])
if lowlink[v] == index[v]:
component = []
while True:
w = stack.pop()
on_stack.remove(w)
component.append(w)
if w == v:
break
result.append(component)
for v in graph.nodes():
if v not in index:
strongconnect(v)
return result
Tarjan’s runs in O(V + E) and finds all strongly connected components in a single DFS pass. Applications: compiler optimization (finding loops), social network analysis (finding tightly-knit communities), and 2-SAT satisfiability.
Python’s networkx for production use
For production graph analysis, the networkx library provides battle-tested implementations:
import networkx as nx
G = nx.DiGraph()
G.add_weighted_edges_from([("A", "B", 4), ("A", "C", 2), ("B", "D", 3), ("C", "D", 1)])
# Shortest path
path = nx.shortest_path(G, "A", "D", weight="weight") # ['A', 'C', 'D']
length = nx.shortest_path_length(G, "A", "D", weight="weight") # 3
# All SCCs
sccs = list(nx.strongly_connected_components(G))
# MST (for undirected)
UG = G.to_undirected()
mst = nx.minimum_spanning_tree(UG)
networkx is not the fastest (pure Python), but it handles correctness, edge cases, and has algorithms for nearly every graph problem. For large-scale graphs (millions of nodes), consider igraph (C backend) or graph-tool (C++/Boost backend).
Performance benchmarks
| Algorithm | Time Complexity | Space | Best Python Implementation |
|---|---|---|---|
| BFS | O(V + E) | O(V) | collections.deque |
| DFS | O(V + E) | O(V) | iterative with list as stack |
| Dijkstra | O((V+E) log V) | O(V) | heapq |
| Bellman-Ford | O(V × E) | O(V) | edge list iteration |
| A* | O(E) best case | O(V) | heapq with heuristic |
| Kruskal’s MST | O(E log E) | O(V) | Union-Find + sorted edges |
| Tarjan’s SCC | O(V + E) | O(V) | recursive DFS |
Tradeoffs and practical considerations
- Sparse vs. dense: most real-world graphs are sparse. Use adjacency lists.
- Directed vs. undirected: always be explicit about direction. Undirected graphs store each edge twice.
- Weighted vs. unweighted: BFS is optimal for unweighted; Dijkstra for non-negative weights; Bellman-Ford for negative weights.
- Recursion depth: Python’s default recursion limit (1000) limits recursive DFS on deep graphs. Use iterative implementations or
sys.setrecursionlimit()with caution. - Memory for large graphs: for graphs with millions of nodes, adjacency lists with integer node IDs and
arrayor NumPy-backed storage are essential.
One thing to remember: Graph algorithms are not exotic — they are the backbone of routing, scheduling, dependency resolution, and network analysis. Master BFS, DFS, Dijkstra, and topological sort, and you can solve most real-world graph problems.
See Also
- Python Backtracking Algorithms How computers solve puzzles by trying every path, backing up when stuck, and never giving up until they find the answer.
- Python Big O Complexity Analysis Why some programs finish in a blink while others take forever — explained with pizza delivery and toy cleanup.
- Python Binary Search Implementation Find anything in a sorted list insanely fast using the same trick you already use with dictionaries and phone books.
- Python Dynamic Programming The clever trick of remembering answers you already figured out so you never solve the same puzzle twice.
- Python Greedy Algorithms Why always grabbing the biggest piece sometimes gives you the best answer — and sometimes does not.