Python gc Module Internals — Deep Dive

Explore CPython's garbage collector implementation — the tri-color marking algorithm, generational promotion, finalization ordering, and production tuning strategies.

The collection algorithm

CPython’s cyclic garbage collector does not use the classical mark-and-sweep algorithm. Instead, it uses a variant based on reference count subtraction (sometimes called trial deletion):

Build candidate set — all container objects in the target generation(s) are gathered.
Subtract internal references — for each object in the set, decrement a temporary copy of the reference count for every reference it holds to another object in the set. This removes “internal” references within the candidate set.
Find roots — objects whose adjusted reference count is still greater than zero are reachable from outside the set. They are “roots.”
Mark reachable — starting from roots, traverse all references and mark objects as reachable.
Collect unreachable — everything not marked as reachable is garbage. Free it.

import gc
import sys

class Node:
    def __init__(self, name):
        self.name = name
        self.ref = None

# Create a cycle
a = Node("A")
b = Node("B")
a.ref = b
b.ref = a

# Reference counts include the local variables
print(f"refcount(a): {sys.getrefcount(a)}")  # 2 (a + getrefcount arg)
print(f"refcount(b): {sys.getrefcount(b)}")  # 2

# Remove local references — objects only referenced by each other
del a, b

# Force collection and see what was collected
gc.set_debug(gc.DEBUG_COLLECTABLE | gc.DEBUG_STATS)
collected = gc.collect()
print(f"Collected: {collected} objects")
gc.set_debug(0)

Generational promotion mechanics

When a generation-0 collection runs:

Objects in generation 0 are scanned for cycles.
Unreachable objects are freed.
Surviving objects are promoted to generation 1.
The generation-0 collection counter resets, and the generation-1 counter increments.
When the generation-1 counter exceeds its threshold, generations 0 and 1 are collected together, and survivors are promoted to generation 2.

Generation 2 is special: it is the oldest generation and objects stay there permanently. To avoid scanning the entire heap too often, generation-2 collections use a ratio-based heuristic. The collector only runs a full (gen-2) collection when the ratio of long-lived objects to total objects justifies the work.

import gc

# Monitor collection counts
print(f"Collection counts: {gc.get_count()}")
# Returns (gen0_count, gen1_count, gen2_count)
# gen0_count = allocations - deallocations since last gen-0 collection

Finalization and del

Before Python 3.4, objects with __del__ methods involved in reference cycles were considered “uncollectable” — the collector could not determine a safe order to call their finalizers. These objects were placed in gc.garbage for manual inspection.

Python 3.4 (PEP 442) introduced safe finalization ordering:

import gc

class Resource:
    def __init__(self, name):
        self.name = name
        self.partner = None

    def __del__(self):
        print(f"Finalizing {self.name}")

# Even with __del__ and a cycle, Python 3.4+ can collect these
r1 = Resource("R1")
r2 = Resource("R2")
r1.partner = r2
r2.partner = r1
del r1, r2

gc.collect()  # Prints finalization messages, objects are freed

The modern collector uses a topological sort on the reference graph to determine finalization order. If no safe order exists, it still collects the objects but may call __del__ in an arbitrary order.

Weakref integration

Weak references (via the weakref module) interact with the gc in important ways. A weak reference does not prevent garbage collection, and the gc clears weak references before calling __del__:

import weakref
import gc

class Tracked:
    pass

obj = Tracked()
ref = weakref.ref(obj, lambda r: print("Weak reference cleared"))

del obj
gc.collect()
# "Weak reference cleared" is printed
# The weakref callback fires BEFORE __del__ (if any) would be called

This ordering guarantees that __del__ methods never see dangling weak references.

Production tuning strategies

Strategy 1: Tune thresholds for your workload

import gc

# Default: (700, 10, 10)
# For a web server handling many short-lived requests:
gc.set_threshold(1000, 15, 15)
# Higher gen-0 threshold = fewer collections
# Higher gen-1/2 multipliers = less frequent full collections

Strategy 2: Disable gc during latency-sensitive operations

import gc

gc.disable()
try:
    process_critical_request()
finally:
    gc.enable()
    # Optionally run collection at a convenient time
    gc.collect()

Strategy 3: The Instagram approach

Instagram’s engineering team (2017) found that gc pauses caused p99 latency spikes. Their solution:

Disable gc entirely with gc.disable()
Avoid reference cycles in application code
Use weakref for parent-child relationships
Fork worker processes after gc is stable (copy-on-write optimization)

The copy-on-write aspect is crucial: when a process forks, the OS shares memory pages until one process modifies them. CPython’s gc modifies reference counts and object headers during collection, causing massive copy-on-write amplification. Disabling gc keeps shared pages intact.

Strategy 4: Manual collection timing

import gc
import time

class GCScheduler:
    def __init__(self, interval_seconds=30):
        self.interval = interval_seconds
        self.last_collect = time.monotonic()
        gc.disable()

    def maybe_collect(self):
        now = time.monotonic()
        if now - self.last_collect >= self.interval:
            gc.collect()
            self.last_collect = now

    def collect_between_requests(self):
        """Call between request handling for web servers."""
        gen0, gen1, gen2 = gc.get_count()
        if gen0 > 2000:  # Only if significant allocation pressure
            gc.collect(0)  # Quick gen-0 only

Debugging memory leaks with gc

A systematic approach to finding reference cycles:

import gc
import sys
from collections import Counter

def find_leaking_types():
    """Identify types that are accumulating in the gc."""
    gc.collect()
    type_counts = Counter(type(obj).__name__ for obj in gc.get_objects())
    return type_counts.most_common(20)

def find_cycle_for_object(target):
    """Find reference cycle involving a specific object."""
    gc.collect()
    referrers = gc.get_referrers(target)
    for ref in referrers:
        if ref is target:
            continue
        # Check if this referrer is also referred to by target
        target_referents = gc.get_referents(target)
        if ref in target_referents:
            print(f"Cycle: {type(target).__name__} <-> {type(ref).__name__}")
            return ref
    return None

def snapshot_diff():
    """Compare object counts between two points in time."""
    gc.collect()
    before = Counter(type(obj).__name__ for obj in gc.get_objects())

    yield  # Caller does work here

    gc.collect()
    after = Counter(type(obj).__name__ for obj in gc.get_objects())

    diff = after - before
    print("New objects since snapshot:")
    for type_name, count in diff.most_common(10):
        print(f"  {type_name}: +{count}")

GC callbacks (Python 3.3+)

import gc
import time

def gc_callback(phase, info):
    if phase == "start":
        gc_callback._start = time.perf_counter()
    elif phase == "stop":
        elapsed = time.perf_counter() - gc_callback._start
        gen = info["generation"]
        collected = info["collected"]
        print(f"GC gen-{gen}: collected {collected} objects "
              f"in {elapsed*1000:.2f}ms")

gc.callbacks.append(gc_callback)

This is the production-grade way to monitor gc impact on your application’s latency.

The freeze() function (Python 3.7+)

gc.freeze() moves all tracked objects into the permanent generation, preventing the collector from ever scanning them. This is designed for the fork-based server pattern:

import gc

# During application startup (before forking workers)
gc.collect()   # Clean up any startup garbage
gc.freeze()    # Move everything to permanent generation

# Now fork workers — the frozen objects won't cause
# copy-on-write when gc runs in child processes

gc.unfreeze() reverses this, and gc.get_freeze_count() tells you how many objects are frozen.

The one thing to remember: CPython’s cyclic gc uses reference-count subtraction (not mark-and-sweep) across three generations, and production tuning means controlling when collections happen relative to your application’s latency-sensitive paths — especially in forked process architectures where gc-triggered copy-on-write amplification is the real enemy.

pythonmemory-managementinternals