Python gc Module Internals — Core Concepts

Why the gc module matters

CPython’s primary memory management is reference counting: when an object’s reference count drops to zero, it is immediately freed. This handles roughly 95% of memory cleanup. But reference counting cannot handle circular references — objects that reference each other. The gc module provides a cyclic garbage collector to handle the remaining cases.

The two-layer system

Layer 1: Reference counting (always on, not part of gc module) Every Python object has a reference count. Assignment increases it, deletion decreases it. When it reaches zero, the object is freed immediately. This is fast and deterministic.

Layer 2: Cyclic garbage collector (the gc module) Periodically scans for groups of objects that reference each other but are unreachable from the rest of the program. This is the collector you control through gc.

Generational collection

The gc module uses a generational strategy with three generations (0, 1, 2):

  • Generation 0 — newly created objects. Collected most frequently.
  • Generation 1 — objects that survived one generation-0 collection.
  • Generation 2 — long-lived objects. Collected least frequently.

The idea behind generations is the “infant mortality” hypothesis: most objects are short-lived. By checking young objects frequently and old objects rarely, the collector avoids scanning the entire heap on every run.

Collection thresholds

Each generation has a threshold that triggers collection. You can see and set them:

import gc

# Default thresholds: (700, 10, 10)
print(gc.get_threshold())

# Meaning: collect gen-0 after 700 new allocations (net),
# collect gen-1 after gen-0 has been collected 10 times,
# collect gen-2 after gen-1 has been collected 10 times

The generation-0 threshold counts the difference between allocations and deallocations since the last collection. When this net count exceeds the threshold, a generation-0 collection runs.

Common gc operations

Force a collection

gc.collect()  # Runs a full collection (all generations)
gc.collect(0)  # Collect only generation 0
gc.collect(1)  # Collect generations 0 and 1

Find what the collector is tracking

# List of all objects tracked by the collector
tracked = gc.get_objects()
print(f"Tracked objects: {len(tracked)}")

Not all objects are tracked. Simple objects that cannot contain references (integers, strings, floats) are not tracked because they cannot participate in reference cycles.

Check for referrers and referents

my_list = [1, 2, 3]
# What objects refer to my_list?
print(gc.get_referrers(my_list))
# What objects does my_list refer to?
print(gc.get_referents(my_list))

What creates reference cycles?

The most common sources:

  • Objects with __del__ methods referencing other objects — the collector may not be able to safely collect these (improved in Python 3.4+)
  • Parent-child relationships — a tree where children have a .parent reference
  • Caches and registries — dicts or lists that accumulate references
  • Exception tracebacks — tracebacks reference frames which reference local variables which may reference the traceback

Disabling the collector

Some high-performance applications disable the gc and rely entirely on reference counting:

gc.disable()  # Turn off automatic cyclic collection
# ... performance-critical code ...
gc.enable()   # Turn it back on

Instagram famously disabled the gc in production because the collection pauses were disrupting latency. This works when your code avoids reference cycles (or you manually break them).

Common misconception

Many developers think gc.collect() frees all unused memory. It only handles reference cycles. Objects with a reference count of zero are already freed by the reference counting system before gc even runs. If memory usage is high but gc.collect() frees nothing, the objects are still reachable — the problem is not cycles but actual live references holding onto data.

Debug mode

gc.set_debug(gc.DEBUG_LEAK)
# Prints information about uncollectable objects and cycles

This is invaluable when tracking down memory leaks caused by objects with __del__ methods that the collector cannot safely finalize.

The one thing to remember: Python’s gc module handles only circular references — the 5% of memory management that reference counting misses — and its generational approach makes this efficient by focusing most effort on newly created objects.

pythonmemory-managementinternals

See Also

  • Python Ast Module Code Analysis How Python's ast module reads your code like a grammar teacher diagrams sentences — turning source text into a tree you can inspect and change.
  • Python Dis Module Bytecode How Python's dis module lets you peek at the secret instructions your computer actually runs when it executes your Python code.
  • Python Importlib Custom Loaders How Python's importlib lets you teach Python to load code from anywhere — databases, zip files, the internet, or even generated on the fly.
  • Python Site Customization How Python's site module sets up your environment before your code even starts running — the invisible first step of every Python program.
  • Python Startup Optimization Why Python takes a moment to start and what you can do to make your scripts and tools launch faster.