Python Garbage Collector Tuning — Core Concepts

Understand Python GC generations, thresholds, and when tuning collection frequency improves latency or memory behavior in real services.

Python uses reference counting plus a cyclic garbage collector. Most objects are reclaimed immediately when references drop to zero, while the GC handles cycles that reference counting alone cannot free.

Generational Model

CPython’s GC tracks objects in generations:

younger generation: many short-lived objects
older generations: longer-lived objects

The intuition: most objects die young, so collect young generations more frequently.

Thresholds and Collection Frequency

You can inspect and adjust thresholds:

import gc
print(gc.get_threshold())
# gc.set_threshold(700, 10, 10)

Lower thresholds trigger collections more often; higher thresholds delay collections.

When Tuning Helps

Tuning can help when profiling shows:

frequent GC runs adding latency spikes
excessive memory growth between collections
workload-specific churn patterns (many temporary container objects)

If you have not measured GC impact, tuning is guesswork.

What to Measure Before and After

request latency percentiles (p95/p99)
throughput
RSS/heap trend
GC stats (gc.get_stats() in modern Python)

A threshold change that improves average latency but worsens p99 may not be acceptable.

Temporary Disable Pattern (Careful)

For tight compute sections with no cycle creation, some teams temporarily disable GC:

import gc

gc.disable()
try:
    run_hot_section()
finally:
    gc.enable()

Only do this with strong evidence and guardrails. Disabling GC globally in long-lived services is risky.

Common Misconception

Misconception: lowering thresholds always improves memory.

Reality: lower thresholds can reduce some memory growth but may introduce more pause overhead and CPU cost.

Practical Approach

baseline metrics under realistic load
change one threshold set at a time
run long enough to observe memory trend
keep settings only if net outcome improves service goals

Use Python Memory Profiling first; GC tuning should respond to measured allocation/churn patterns, not replace profiling.

A Small Experiment Template

Try this practical experiment:

baseline run with default GC thresholds for 30 minutes
run again with one threshold change
compare p95 latency, CPU, and memory slope

If memory improves but latency worsens significantly, revert. If latency improves but memory drifts toward OOM, revert.

A tuning decision is successful only when overall service health improves, not one isolated metric.

Another useful trick is to graph GC collection counts next to request traffic. If bursts line up with user-facing spikes, you have evidence to optimize allocation patterns before more threshold changes.

One Thing to Remember

GC tuning is an empirical exercise: adjust thresholds with metrics, and choose settings that improve your real latency-memory tradeoff.

pythongcruntimeperformance