Python graphlib Topological Sort — Core Concepts
What graphlib solves
Given a set of tasks with dependencies (“B must come before A”, “C must come before A”), find an order that satisfies all constraints. This is topological sorting — ordering nodes of a directed acyclic graph (DAG) so every edge points forward.
Python 3.9 added graphlib.TopologicalSorter to the standard library for exactly this.
Basic usage
from graphlib import TopologicalSorter
ts = TopologicalSorter()
ts.add("install", "download") # install depends on download
ts.add("configure", "install") # configure depends on install
ts.add("run", "configure", "migrate") # run depends on configure AND migrate
ts.add("migrate", "install") # migrate depends on install
order = list(ts.static_order())
print(order)
# ['download', 'install', 'migrate', 'configure', 'run']
static_order() returns all nodes in a valid topological order. Dependencies always appear before dependents.
Constructor shorthand
You can pass the entire graph as a dictionary:
from graphlib import TopologicalSorter
graph = {
"run": {"configure", "migrate"},
"configure": {"install"},
"migrate": {"install"},
"install": {"download"},
"download": set(),
}
order = list(TopologicalSorter(graph).static_order())
The dictionary maps each node to its set of predecessors (dependencies).
Cycle detection
If there’s a circular dependency, topological sorting is impossible. TopologicalSorter raises CycleError:
from graphlib import TopologicalSorter, CycleError
ts = TopologicalSorter()
ts.add("A", "B")
ts.add("B", "C")
ts.add("C", "A") # creates a cycle: A → B → C → A
try:
list(ts.static_order())
except CycleError as e:
print(e)
# "nodes are in a cycle", with the cycle path available via e.args
This is useful for validating configuration files, build systems, and dependency graphs before execution.
Parallel execution with prepare/done
The real power of TopologicalSorter is its parallel scheduling API:
from graphlib import TopologicalSorter
import concurrent.futures
ts = TopologicalSorter({
"deploy": {"build", "test"},
"test": {"build"},
"build": {"lint"},
"lint": set(),
})
ts.prepare()
with concurrent.futures.ThreadPoolExecutor() as executor:
while ts.is_active():
ready = ts.get_ready()
futures = {executor.submit(run_task, t): t for t in ready}
for future in concurrent.futures.as_completed(futures):
task = futures[future]
future.result() # raise if failed
ts.done(task)
The flow:
prepare()— initialize the schedulerget_ready()— returns all nodes whose dependencies are satisfied- Execute those nodes (in parallel if desired)
done(node)— mark a node as complete, potentially unlocking new nodes- Repeat until
is_active()returns False
This is the pattern used by build systems (Make, Bazel) and CI/CD pipelines.
Common misconception
Many developers think topological sort produces a unique order. It usually doesn’t. If A and B have no dependency relationship, either can come first. static_order() returns one valid order, not the order. The parallel API (get_ready) makes this explicit by returning batches of independent nodes.
Practical use cases
Package installation order:
# pip-style dependency resolution
deps = {
"flask": {"werkzeug", "jinja2", "click"},
"werkzeug": set(),
"jinja2": {"markupsafe"},
"markupsafe": set(),
"click": set(),
}
install_order = list(TopologicalSorter(deps).static_order())
# Install in this order and every dependency is available when needed
Database migration order:
migrations = {
"003_add_index": {"002_add_column"},
"002_add_column": {"001_create_table"},
"001_create_table": set(),
}
Course prerequisite planning:
courses = {
"machine_learning": {"linear_algebra", "statistics"},
"linear_algebra": {"calculus"},
"statistics": {"calculus"},
"calculus": set(),
}
Adding nodes without dependencies
Nodes with no dependencies don’t need to be added explicitly — they’re discovered as predecessors. But if you have isolated nodes (no dependencies, not depended upon), add them with no predecessors:
ts = TopologicalSorter()
ts.add("standalone_task") # no dependencies
ts.add("other", "standalone_task")
Error handling
| Error | When | Fix |
|---|---|---|
CycleError | Circular dependency exists | Remove or break the cycle |
ValueError from done() | Marking unknown or already-done node | Check node tracking logic |
ValueError from get_ready() | Called without prepare() | Call prepare() first |
The one thing to remember: graphlib.TopologicalSorter handles both simple ordering (static_order) and parallel scheduling (prepare/get_ready/done) — it’s a dependency resolver and a task scheduler in one standard library class.
See Also
- Python Atexit How Python's atexit module lets your program clean up after itself right before it shuts down.
- Python Bisect Sorted Lists How Python's bisect module finds things in sorted lists the way you'd find a word in a dictionary — by jumping to the middle.
- Python Contextlib How Python's contextlib module makes the 'with' statement work for anything, not just files.
- Python Copy Module Why copying data in Python isn't as simple as it sounds, and how the copy module prevents sneaky bugs.
- Python Dataclass Field Metadata How Python dataclass fields can carry hidden notes — like sticky notes on a filing cabinet that tools read automatically.