Python Caching with lru_cache — Deep Dive

Inside lru_cache's doubly-linked list, hash map internals, typed mode, and advanced patterns for production Python caching.

Internal Data Structure

Under the hood, CPython’s lru_cache maintains two data structures working in tandem:

A dictionary mapping argument tuples to cached results (O(1) lookup).
A circular doubly-linked list that tracks access order for eviction.

When a cached function is called with known arguments, the dictionary provides the result immediately. The corresponding node in the linked list is then moved to the head (most recently used position). When the cache is full and a new entry arrives, the tail node (least recently used) is removed from both the list and the dictionary.

This design gives O(1) time complexity for both cache hits and evictions — the same strategy used by operating systems for page replacement and by databases like Redis for key eviction.

The Make Key Function

Before looking up arguments in the dictionary, lru_cache converts them into a hashable key using an internal _make_key function. This function:

Flattens positional and keyword arguments into a single tuple
Adds type markers between positional and keyword argument sections
Includes argument names for keyword arguments so that f(a=1, b=2) and f(b=2, a=1) produce the same key

from functools import lru_cache

@lru_cache(maxsize=32)
def add(a, b):
    return a + b

# These all produce the same cache key:
add(1, 2)
add(a=1, b=2)
add(1, b=2)

This normalization has a cost: _make_key runs on every call, cached or not. For functions with many arguments, this overhead can become significant relative to the function’s actual computation time.

Typed Mode

By default, lru_cache treats 3 (int) and 3.0 (float) as the same key because they compare equal in Python:

@lru_cache(maxsize=128)
def compute(x):
    return x * 2

compute(3)    # Computes and caches
compute(3.0)  # Returns cached result from compute(3)

Setting typed=True forces separate cache entries for different types:

@lru_cache(maxsize=128, typed=True)
def compute(x):
    return x * 2

compute(3)    # Caches under int key
compute(3.0)  # Separate entry, caches under float key

Use typed=True when your function behaves differently for different types (e.g., serialization functions) or when type correctness matters more than cache density.

Thread Safety

In CPython, lru_cache is thread-safe thanks to a C-level lock (not the GIL — there’s an actual dedicated lock). The lock is held during cache lookups and updates, meaning:

Multiple threads can safely call the same cached function concurrently
Under high contention, the lock becomes a bottleneck
The function itself may execute multiple times for the same arguments if several threads call it simultaneously before any result is cached (no thundering-herd protection)

For write-heavy workloads with many unique arguments, consider per-thread caching or a concurrent dictionary instead.

Advanced Patterns

Caching with TTL

lru_cache has no built-in TTL. A common workaround uses a time-bucketing trick:

import time
from functools import lru_cache

def ttl_hash(seconds=300):
    """Return the same value within each time window."""
    return round(time.time() / seconds)

@lru_cache(maxsize=128)
def get_config(key, _ttl_hash=None):
    return load_config_from_disk(key)

# Usage — cache refreshes every 5 minutes
result = get_config("database_url", _ttl_hash=ttl_hash())

The _ttl_hash parameter changes value every 300 seconds, causing a cache miss and re-computation. Old entries get evicted naturally by the LRU policy.

Caching Instance Methods

Decorating a method with lru_cache directly causes memory leaks because self becomes part of the cache key, holding a strong reference to every instance:

class DataProcessor:
    @lru_cache(maxsize=64)  # DON'T — leaks instances
    def process(self, data_id):
        return heavy_computation(data_id)

The standard fix uses __hash__ and __eq__ on the class, or moves the cache outside:

class DataProcessor:
    def __init__(self):
        self.process = lru_cache(maxsize=64)(self._process)

    def _process(self, data_id):
        return heavy_computation(data_id)

Or use a module-level function that accepts the relevant data instead of self.

Combining with `@property`

Caching a property computation avoids repeated work on attribute access:

from functools import cached_property  # Python 3.8+

class Report:
    @cached_property
    def summary(self):
        return expensive_aggregation(self.data)

cached_property computes the value once, stores it as an instance attribute, and never recomputes. Unlike lru_cache, it has no size limit or eviction — it’s a one-shot cache per instance.

Benchmarking the Overhead

The overhead of lru_cache comes from key generation and dictionary lookup. For a function that executes in under 100 nanoseconds (like adding two numbers), the caching overhead actually makes it slower. The break-even point is roughly:

Function cost	Cache benefit
< 1 μs	Overhead likely exceeds savings
1–100 μs	Moderate benefit if hit rate is high
> 100 μs	Almost always beneficial
> 1 ms	Dramatic speedup possible

Profile with cache_info() before assuming caching helps.

functools.cache vs lru_cache

Python 3.9 added @functools.cache, which is equivalent to @lru_cache(maxsize=None):

No eviction: every unique call is cached forever
Slightly faster per-call (no linked list maintenance)
Best for pure functions with bounded input domains (e.g., mathematical computations with known ranges)
Dangerous for functions called with unbounded unique inputs

Use @cache when you know the input space is finite. Use @lru_cache(maxsize=N) when you need bounded memory.

Real-World Performance

Instagram’s backend team reported that adding lru_cache to their Django template tag resolution reduced CPU usage by 12% on some endpoints. The pattern works especially well in request-response cycles where the same computations repeat across requests within a single process lifecycle.

Dropbox uses memoization extensively in their Python desktop client for file metadata lookups — computing hashes and checking modification times are expensive operations that benefit from per-session caching.

When Not to Use It

Side-effectful functions — Functions that write to databases, send emails, or modify global state should never be cached. The cache would suppress repeated side effects.
Functions with large return values — Caching 100MB DataFrames at maxsize=128 means potentially 12.8 GB of memory held in cache.
High cardinality inputs — If every call has unique arguments, the cache just wastes memory on entries that never get hit.

The one thing to remember: lru_cache is an O(1) memoization layer backed by a hash map and doubly-linked list — powerful for pure, deterministic functions with repeating inputs, but it requires careful attention to memory, mutability, and argument hashability in production.

pythonperformancecachingoptimizationinternals