functools Module — Deep Dive
lru_cache Internals
The C implementation of lru_cache uses a circular doubly-linked list for O(1) access ordering, combined with a dictionary for O(1) lookup:
Dictionary: {args_key: linked_list_node}
Linked List: HEAD ↔ node1 ↔ node2 ↔ ... ↔ TAIL
(newest) (oldest)
When a cached function is called:
- Arguments are converted to a hashable key (tuple of args + kwargs sentinel + types)
- If the key exists in the dict: move the node to the head (most recently used), return the cached value
- If not: call the function, create a new node at the head, evict the tail if at capacity
The key creation is the performance bottleneck. Every call must hash all arguments:
# What happens internally for cache key creation:
# f(1, 2, x=3) → key = (1, 2, _KWD_MARK, 'x', 3)
When lru_cache Breaks
Arguments must be hashable. These fail:
@lru_cache
def process(data: list): # TypeError: unhashable type: 'list'
return sum(data)
Workaround — convert to a hashable type:
@lru_cache
def process(data: tuple): # tuples are hashable
return sum(data)
process(tuple([1, 2, 3]))
Thread Safety
lru_cache is thread-safe for reads and writes — it uses a reentrant lock internally. However, the underlying function can still be called simultaneously by multiple threads (on cache miss). If the function has side effects, this matters:
@lru_cache(maxsize=100)
def get_user(user_id):
# Multiple threads might call this simultaneously for the same user_id
# Only one result gets cached; others are discarded
return database.fetch_user(user_id)
Cache Invalidation
@lru_cache(maxsize=256)
def get_config(key):
return load_from_file(key)
# Clear entire cache
get_config.cache_clear()
# Inspect cache
info = get_config.cache_info()
# CacheInfo(hits=45, misses=12, maxsize=256, currsize=12)
There’s no way to invalidate a single key. If you need that, use a dictionary-based cache instead.
cached_property: One-Time Computation
cached_property computes a value once and stores it as an instance attribute:
from functools import cached_property
class Dataset:
def __init__(self, path):
self.path = path
@cached_property
def data(self):
"""Loaded once, then stored as self.data."""
print("Loading...")
with open(self.path) as f:
return f.read()
Unlike @property + manual caching, cached_property replaces the descriptor with the computed value on first access. Subsequent accesses hit the instance’s __dict__ directly — no descriptor protocol overhead.
Threading Caveat
In Python 3.12+, cached_property is NOT thread-safe (the lock was removed for performance). In multi-threaded code, the property might be computed multiple times. If this is problematic, add your own locking:
import threading
class ThreadSafeDataset:
_lock = threading.Lock()
@cached_property
def data(self):
with self._lock:
# Double-check: another thread might have set it
if 'data' in self.__dict__:
return self.__dict__['data']
return expensive_load()
singledispatch: Advanced Patterns
Method dispatch with singledispatchmethod
singledispatch works on functions. For methods, use singledispatchmethod (Python 3.8+):
from functools import singledispatchmethod
class Serializer:
@singledispatchmethod
def serialize(self, value):
raise TypeError(f"Cannot serialize {type(value)}")
@serialize.register
def _(self, value: str):
return f'"{value}"'
@serialize.register
def _(self, value: int):
return str(value)
@serialize.register
def _(self, value: list):
items = ", ".join(self.serialize(v) for v in value)
return f"[{items}]"
Registration with type annotations
Since Python 3.7, you can register using type annotations instead of explicit type arguments:
@singledispatch
def process(value):
raise TypeError(f"Unsupported: {type(value)}")
@process.register
def _(value: int):
return value * 2
@process.register
def _(value: str):
return value.upper()
Union types and abstract classes
from collections.abc import Mapping, Sequence
@process.register(Mapping)
def _(value):
return {k: process(v) for k, v in value.items()}
@process.register(Sequence)
def _(value):
return [process(v) for v in value]
ABCs and virtual subclasses work with singledispatch — it checks the MRO.
partial: Under the Hood
partial creates a functools.partial object (implemented in C) that stores the wrapped function, frozen args, and frozen kwargs:
from functools import partial
def connect(host, port, timeout=30):
print(f"Connecting to {host}:{port} (timeout={timeout})")
local_connect = partial(connect, "localhost", timeout=5)
# Inspect the partial
print(local_connect.func) # <function connect>
print(local_connect.args) # ('localhost',)
print(local_connect.keywords) # {'timeout': 5}
local_connect(8080) # Connecting to localhost:8080 (timeout=5)
partialmethod for descriptors
partialmethod works inside class definitions where partial doesn’t (because partial doesn’t implement the descriptor protocol):
from functools import partialmethod
class Connection:
def set_state(self, state, reason=""):
self.state = state
self.reason = reason
connect = partialmethod(set_state, "connected")
disconnect = partialmethod(set_state, "disconnected")
conn = Connection()
conn.connect(reason="user request")
print(conn.state) # "connected"
print(conn.reason) # "user request"
reduce: When It’s Actually Useful
Beyond simple aggregation, reduce excels at building nested structures:
from functools import reduce
# Deep dictionary access
def deep_get(d, keys):
return reduce(lambda obj, key: obj[key], keys, d)
config = {"database": {"primary": {"host": "db.example.com"}}}
deep_get(config, ["database", "primary", "host"]) # "db.example.com"
# Function composition
def compose(*funcs):
return reduce(lambda f, g: lambda x: f(g(x)), funcs)
transform = compose(str.upper, str.strip, str.replace)
# Doesn't quite work — but shows the pattern
reduce with operator module
from functools import reduce
from operator import mul, or_
# Product of a list
reduce(mul, [1, 2, 3, 4, 5]) # 120
# Bitwise OR of flags
reduce(or_, [0x01, 0x02, 0x08]) # 0x0B (11)
cmp_to_key: Legacy Compatibility
Converts old-style comparison functions (returning -1, 0, 1) to key functions for sorted():
from functools import cmp_to_key
def compare_versions(a, b):
a_parts = list(map(int, a.split(".")))
b_parts = list(map(int, b.split(".")))
if a_parts < b_parts:
return -1
elif a_parts > b_parts:
return 1
return 0
versions = ["1.2.3", "1.0.0", "2.1.0", "1.2.1"]
sorted(versions, key=cmp_to_key(compare_versions))
# ['1.0.0', '1.2.1', '1.2.3', '2.1.0']
Performance Patterns
Warming the cache
For web applications, pre-fill caches at startup:
@lru_cache(maxsize=1000)
def get_template(name):
return load_and_compile_template(name)
def warm_caches():
"""Call at application startup."""
for name in get_all_template_names():
get_template(name)
Typed caching
lru_cache(typed=True) treats 3 and 3.0 as different arguments:
@lru_cache(typed=True)
def process(value):
return type(value).__name__
process(3) # "int" — cached separately
process(3.0) # "float" — different cache entry
Without typed=True, 3 and 3.0 share a cache entry because hash(3) == hash(3.0).
Monitoring cache effectiveness
import atexit
@lru_cache(maxsize=512)
def expensive_query(sql):
return db.execute(sql)
def report_cache_stats():
info = expensive_query.cache_info()
hit_rate = info.hits / (info.hits + info.misses) if info.misses else 1.0
print(f"Cache hit rate: {hit_rate:.1%} ({info.currsize}/{info.maxsize} entries)")
atexit.register(report_cache_stats)
If your hit rate is below 50%, your maxsize is likely too small or the input distribution is too spread out for caching to help.
One thing to remember: functools is implemented largely in C for performance. lru_cache uses a linked-list + dict combo for O(1) operations, partial creates lean C-level callable wrappers, and singledispatch builds a type→function registry with MRO-aware lookup. Master the internals and you’ll know when each tool helps — and when a simpler approach is better.
See Also
- Python Atexit How Python's atexit module lets your program clean up after itself right before it shuts down.
- Python Bisect Sorted Lists How Python's bisect module finds things in sorted lists the way you'd find a word in a dictionary — by jumping to the middle.
- Python Contextlib How Python's contextlib module makes the 'with' statement work for anything, not just files.
- Python Copy Module Why copying data in Python isn't as simple as it sounds, and how the copy module prevents sneaky bugs.
- Python Dataclass Field Metadata How Python dataclass fields can carry hidden notes — like sticky notes on a filing cabinet that tools read automatically.