Python graphlib Topological Sort — Core Concepts

What graphlib solves

Given a set of tasks with dependencies (“B must come before A”, “C must come before A”), find an order that satisfies all constraints. This is topological sorting — ordering nodes of a directed acyclic graph (DAG) so every edge points forward.

Python 3.9 added graphlib.TopologicalSorter to the standard library for exactly this.

Basic usage

from graphlib import TopologicalSorter

ts = TopologicalSorter()
ts.add("install", "download")         # install depends on download
ts.add("configure", "install")        # configure depends on install
ts.add("run", "configure", "migrate") # run depends on configure AND migrate
ts.add("migrate", "install")          # migrate depends on install

order = list(ts.static_order())
print(order)
# ['download', 'install', 'migrate', 'configure', 'run']

static_order() returns all nodes in a valid topological order. Dependencies always appear before dependents.

Constructor shorthand

You can pass the entire graph as a dictionary:

from graphlib import TopologicalSorter

graph = {
    "run": {"configure", "migrate"},
    "configure": {"install"},
    "migrate": {"install"},
    "install": {"download"},
    "download": set(),
}

order = list(TopologicalSorter(graph).static_order())

The dictionary maps each node to its set of predecessors (dependencies).

Cycle detection

If there’s a circular dependency, topological sorting is impossible. TopologicalSorter raises CycleError:

from graphlib import TopologicalSorter, CycleError

ts = TopologicalSorter()
ts.add("A", "B")
ts.add("B", "C")
ts.add("C", "A")  # creates a cycle: A → B → C → A

try:
    list(ts.static_order())
except CycleError as e:
    print(e)
    # "nodes are in a cycle", with the cycle path available via e.args

This is useful for validating configuration files, build systems, and dependency graphs before execution.

Parallel execution with prepare/done

The real power of TopologicalSorter is its parallel scheduling API:

from graphlib import TopologicalSorter
import concurrent.futures

ts = TopologicalSorter({
    "deploy": {"build", "test"},
    "test": {"build"},
    "build": {"lint"},
    "lint": set(),
})

ts.prepare()

with concurrent.futures.ThreadPoolExecutor() as executor:
    while ts.is_active():
        ready = ts.get_ready()
        futures = {executor.submit(run_task, t): t for t in ready}
        for future in concurrent.futures.as_completed(futures):
            task = futures[future]
            future.result()  # raise if failed
            ts.done(task)

The flow:

  1. prepare() — initialize the scheduler
  2. get_ready() — returns all nodes whose dependencies are satisfied
  3. Execute those nodes (in parallel if desired)
  4. done(node) — mark a node as complete, potentially unlocking new nodes
  5. Repeat until is_active() returns False

This is the pattern used by build systems (Make, Bazel) and CI/CD pipelines.

Common misconception

Many developers think topological sort produces a unique order. It usually doesn’t. If A and B have no dependency relationship, either can come first. static_order() returns one valid order, not the order. The parallel API (get_ready) makes this explicit by returning batches of independent nodes.

Practical use cases

Package installation order:

# pip-style dependency resolution
deps = {
    "flask": {"werkzeug", "jinja2", "click"},
    "werkzeug": set(),
    "jinja2": {"markupsafe"},
    "markupsafe": set(),
    "click": set(),
}
install_order = list(TopologicalSorter(deps).static_order())
# Install in this order and every dependency is available when needed

Database migration order:

migrations = {
    "003_add_index": {"002_add_column"},
    "002_add_column": {"001_create_table"},
    "001_create_table": set(),
}

Course prerequisite planning:

courses = {
    "machine_learning": {"linear_algebra", "statistics"},
    "linear_algebra": {"calculus"},
    "statistics": {"calculus"},
    "calculus": set(),
}

Adding nodes without dependencies

Nodes with no dependencies don’t need to be added explicitly — they’re discovered as predecessors. But if you have isolated nodes (no dependencies, not depended upon), add them with no predecessors:

ts = TopologicalSorter()
ts.add("standalone_task")  # no dependencies
ts.add("other", "standalone_task")

Error handling

ErrorWhenFix
CycleErrorCircular dependency existsRemove or break the cycle
ValueError from done()Marking unknown or already-done nodeCheck node tracking logic
ValueError from get_ready()Called without prepare()Call prepare() first

The one thing to remember: graphlib.TopologicalSorter handles both simple ordering (static_order) and parallel scheduling (prepare/get_ready/done) — it’s a dependency resolver and a task scheduler in one standard library class.

pythonstandard-libraryalgorithms

See Also

  • Python Atexit How Python's atexit module lets your program clean up after itself right before it shuts down.
  • Python Bisect Sorted Lists How Python's bisect module finds things in sorted lists the way you'd find a word in a dictionary — by jumping to the middle.
  • Python Contextlib How Python's contextlib module makes the 'with' statement work for anything, not just files.
  • Python Copy Module Why copying data in Python isn't as simple as it sounds, and how the copy module prevents sneaky bugs.
  • Python Dataclass Field Metadata How Python dataclass fields can carry hidden notes — like sticky notes on a filing cabinet that tools read automatically.