Python API Caching Layers — Core Concepts
Why caching layers matter
A single database query might take 5–50 ms. Multiply that by hundreds of concurrent requests for the same data, and your database becomes the bottleneck. Caching layers intercept repeated queries at different points in the request path, reducing latency and database load simultaneously.
The four caching layers
Layer 1: In-process cache
The fastest cache lives inside your Python process. No network calls, no serialization — just a dictionary lookup.
from functools import lru_cache
from cachetools import TTLCache
# Simple function-level cache
@lru_cache(maxsize=1000)
def get_config_value(key: str) -> str:
return db.query_config(key)
# TTL-based cache for data that changes
config_cache = TTLCache(maxsize=500, ttl=300) # 5-minute TTL
def get_product(product_id: int):
if product_id in config_cache:
return config_cache[product_id]
product = db.get_product(product_id)
config_cache[product_id] = product
return product
Strengths: Sub-microsecond lookups, zero network overhead.
Limitations: Each server process has its own copy. Cache invalidation only affects one process. Memory is limited to the process heap. Not suitable for data that changes frequently across multiple servers.
Best for: Configuration values, reference data, feature flags — things that change rarely and are read constantly.
Layer 2: Distributed cache (Redis or Memcached)
A shared cache that all API servers can read from and write to.
import redis.asyncio as redis
import orjson
redis_client = redis.Redis(host="localhost", port=6379)
async def get_user_profile(user_id: int):
cache_key = f"user:profile:{user_id}"
cached = await redis_client.get(cache_key)
if cached:
return orjson.loads(cached)
profile = await db.get_user_profile(user_id)
await redis_client.setex(cache_key, 600, orjson.dumps(profile)) # 10 min TTL
return profile
Strengths: Shared across all servers, supports expiration and eviction policies, handles millions of keys, sub-millisecond lookups over the local network.
Limitations: Adds a network hop (~0.5–2 ms), requires a running Redis instance, serialization/deserialization cost.
Best for: User profiles, product catalogs, session data, any frequently read data that multiple servers need.
Layer 3: HTTP caching
Use HTTP cache headers so clients, proxies, and CDNs can cache responses without hitting your API at all.
from fastapi import Response
@app.get("/products/{product_id}")
async def get_product(product_id: int, response: Response):
product = await cached_get_product(product_id)
response.headers["Cache-Control"] = "public, max-age=300"
response.headers["ETag"] = f'"{product.version}"'
return product
Key headers:
Cache-Control: public, max-age=300— anyone can cache this for 5 minutesCache-Control: private, max-age=60— only the browser can cache (not CDNs)ETag— a version identifier; clients sendIf-None-Matchto check freshnessVary: Authorization— cache separately per authenticated user
Layer 4: CDN / edge cache
Services like CloudFlare, Fastly, or AWS CloudFront cache responses at edge locations worldwide. A user in Tokyo gets a cached response from a Tokyo server instead of hitting your API in Frankfurt.
Best for: Public content (product pages, blog posts, images), any endpoint with Cache-Control: public.
Cache invalidation strategies
The hardest part of caching. Three main approaches:
Time-based (TTL): Set an expiration time. Data might be stale for up to TTL seconds. Simple and predictable.
Event-based: When data changes, explicitly delete or update the cache entry. More complex but ensures freshness.
async def update_product(product_id: int, data: dict):
await db.update_product(product_id, data)
# Invalidate all caching layers
await redis_client.delete(f"product:{product_id}")
await cdn_client.purge(f"/products/{product_id}")
Versioned keys: Include a version number in the cache key. When data changes, increment the version. Old cached entries naturally expire.
Common misconception
Developers often add caching to slow endpoints without measuring. Sometimes the slowness comes from an unindexed query, an N+1 problem, or excessive serialization — not cache absence. Always profile before caching. A well-indexed query that takes 2 ms does not need a cache. A report query that takes 3 seconds does.
Layering strategy
Use multiple layers together:
- In-process cache for ultra-hot data (config, feature flags).
- Redis for shared session and entity data.
- HTTP headers for client-side caching.
- CDN for public, static-ish content.
Each layer reduces the traffic that reaches the next layer, creating a waterfall effect where only truly unique, fresh requests reach your database.
The one thing to remember: Cache in layers from closest (in-process) to farthest (CDN), use TTL for simplicity, event-based invalidation for accuracy, and always measure before adding cache complexity.
See Also
- Python Api Authentication Comparison API keys, JWTs, OAuth, and sessions — four ways Python APIs verify who is knocking at the door.
- Python Api Error Handling Standards Why good error messages from your Python API are like clear road signs — they tell callers exactly what went wrong and what to do next.
- Python Api Load Testing Testing how many people your Python API can handle at once — like stress-testing a bridge before opening it to traffic.
- Python Api Monitoring Observability How Python APIs keep track of their own health — like a car dashboard that warns you before the engine overheats.
- Python Request Validation Patterns How Python APIs check incoming data before trusting it — like a bouncer checking IDs at the door.