Python API Caching Layers — Core Concepts

Understand the caching layers in Python APIs — in-process, Redis, HTTP, and CDN — and when each layer delivers the most value.

Why caching layers matter

A single database query might take 5–50 ms. Multiply that by hundreds of concurrent requests for the same data, and your database becomes the bottleneck. Caching layers intercept repeated queries at different points in the request path, reducing latency and database load simultaneously.

The four caching layers

Layer 1: In-process cache

The fastest cache lives inside your Python process. No network calls, no serialization — just a dictionary lookup.

from functools import lru_cache
from cachetools import TTLCache

# Simple function-level cache
@lru_cache(maxsize=1000)
def get_config_value(key: str) -> str:
    return db.query_config(key)

# TTL-based cache for data that changes
config_cache = TTLCache(maxsize=500, ttl=300)  # 5-minute TTL

def get_product(product_id: int):
    if product_id in config_cache:
        return config_cache[product_id]
    product = db.get_product(product_id)
    config_cache[product_id] = product
    return product

Strengths: Sub-microsecond lookups, zero network overhead.

Limitations: Each server process has its own copy. Cache invalidation only affects one process. Memory is limited to the process heap. Not suitable for data that changes frequently across multiple servers.

Best for: Configuration values, reference data, feature flags — things that change rarely and are read constantly.

Layer 2: Distributed cache (Redis or Memcached)

A shared cache that all API servers can read from and write to.

import redis.asyncio as redis
import orjson

redis_client = redis.Redis(host="localhost", port=6379)

async def get_user_profile(user_id: int):
    cache_key = f"user:profile:{user_id}"
    cached = await redis_client.get(cache_key)
    if cached:
        return orjson.loads(cached)
    
    profile = await db.get_user_profile(user_id)
    await redis_client.setex(cache_key, 600, orjson.dumps(profile))  # 10 min TTL
    return profile

Strengths: Shared across all servers, supports expiration and eviction policies, handles millions of keys, sub-millisecond lookups over the local network.

Limitations: Adds a network hop (~0.5–2 ms), requires a running Redis instance, serialization/deserialization cost.

Best for: User profiles, product catalogs, session data, any frequently read data that multiple servers need.

Layer 3: HTTP caching

Use HTTP cache headers so clients, proxies, and CDNs can cache responses without hitting your API at all.

from fastapi import Response

@app.get("/products/{product_id}")
async def get_product(product_id: int, response: Response):
    product = await cached_get_product(product_id)
    response.headers["Cache-Control"] = "public, max-age=300"
    response.headers["ETag"] = f'"{product.version}"'
    return product

Key headers:

Cache-Control: public, max-age=300 — anyone can cache this for 5 minutes
Cache-Control: private, max-age=60 — only the browser can cache (not CDNs)
ETag — a version identifier; clients send If-None-Match to check freshness
Vary: Authorization — cache separately per authenticated user

Layer 4: CDN / edge cache

Services like CloudFlare, Fastly, or AWS CloudFront cache responses at edge locations worldwide. A user in Tokyo gets a cached response from a Tokyo server instead of hitting your API in Frankfurt.

Best for: Public content (product pages, blog posts, images), any endpoint with Cache-Control: public.

Cache invalidation strategies

The hardest part of caching. Three main approaches:

Time-based (TTL): Set an expiration time. Data might be stale for up to TTL seconds. Simple and predictable.

Event-based: When data changes, explicitly delete or update the cache entry. More complex but ensures freshness.

async def update_product(product_id: int, data: dict):
    await db.update_product(product_id, data)
    # Invalidate all caching layers
    await redis_client.delete(f"product:{product_id}")
    await cdn_client.purge(f"/products/{product_id}")

Versioned keys: Include a version number in the cache key. When data changes, increment the version. Old cached entries naturally expire.

Common misconception

Developers often add caching to slow endpoints without measuring. Sometimes the slowness comes from an unindexed query, an N+1 problem, or excessive serialization — not cache absence. Always profile before caching. A well-indexed query that takes 2 ms does not need a cache. A report query that takes 3 seconds does.

Layering strategy

Use multiple layers together:

In-process cache for ultra-hot data (config, feature flags).
Redis for shared session and entity data.
HTTP headers for client-side caching.
CDN for public, static-ish content.

Each layer reduces the traffic that reaches the next layer, creating a waterfall effect where only truly unique, fresh requests reach your database.

The one thing to remember: Cache in layers from closest (in-process) to farthest (CDN), use TTL for simplicity, event-based invalidation for accuracy, and always measure before adding cache complexity.

pythonapicachingperformanceredis