HTTP Caching Strategies — Deep Dive

HTTP caching in Python: library landscape

Python has several libraries for HTTP caching, each targeting different use cases. The choice depends on your HTTP client and storage requirements.

requests-cache: drop-in caching for requests

requests-cache patches requests.Session to add transparent caching with multiple backends:

import requests_cache

session = requests_cache.CachedSession(
    cache_name="api_cache",
    backend="sqlite",              # Also: redis, mongodb, filesystem
    expire_after=300,              # Default TTL: 5 minutes
    allowable_methods=["GET", "HEAD"],
    stale_if_error=True,           # Serve stale on errors
    urls_expire_after={
        "api.example.com/config": 3600,      # Config: 1 hour
        "api.example.com/users/*": 60,       # Users: 1 minute
        "api.example.com/realtime/*": -1,    # Never cache
    },
)

# Usage is identical to requests.Session
resp = session.get("https://api.example.com/config")
print(resp.from_cache)  # True on subsequent calls

The urls_expire_after dictionary lets you set different TTLs per URL pattern. A value of -1 disables caching for that pattern. stale_if_error=True returns expired cache entries when the server is unreachable — a crucial resilience feature.

hishel: HTTP caching for httpx

For httpx users, hishel provides RFC 9111-compliant HTTP caching:

import hishel
import httpx

storage = hishel.FileStorage(base_path="/tmp/http_cache")
controller = hishel.Controller(
    cacheable_methods=["GET", "HEAD"],
    cacheable_status_codes=[200, 203, 300, 301],
)

transport = hishel.CacheTransport(
    transport=httpx.HTTPTransport(),
    storage=storage,
    controller=controller,
)

client = httpx.Client(transport=transport)
resp = client.get("https://api.example.com/data")

Hishel respects the full HTTP caching specification: Cache-Control, Vary, ETag, Last-Modified, and conditional requests. This is the correct approach for clients that need spec-compliant behavior.

For async:

import hishel
import httpx

async_storage = hishel.AsyncFileStorage(base_path="/tmp/http_cache")
async_transport = hishel.AsyncCacheTransport(
    transport=httpx.AsyncHTTPTransport(),
    storage=async_storage,
)

async_client = httpx.AsyncClient(transport=async_transport)

Implementing conditional requests manually

When you can’t use a caching library, implement conditional requests directly:

import httpx
import json
from pathlib import Path
from dataclasses import dataclass, field


@dataclass
class CacheEntry:
    url: str
    etag: str | None = None
    last_modified: str | None = None
    body: dict = field(default_factory=dict)


class ConditionalClient:
    def __init__(self, client: httpx.Client, cache_dir: str = "/tmp/cache"):
        self._client = client
        self._cache_dir = Path(cache_dir)
        self._cache_dir.mkdir(parents=True, exist_ok=True)

    def _cache_path(self, url: str) -> Path:
        import hashlib
        key = hashlib.sha256(url.encode()).hexdigest()[:16]
        return self._cache_dir / f"{key}.json"

    def _load_entry(self, url: str) -> CacheEntry | None:
        path = self._cache_path(url)
        if path.exists():
            data = json.loads(path.read_text())
            return CacheEntry(**data)
        return None

    def _save_entry(self, entry: CacheEntry) -> None:
        path = self._cache_path(entry.url)
        path.write_text(json.dumps({
            "url": entry.url,
            "etag": entry.etag,
            "last_modified": entry.last_modified,
            "body": entry.body,
        }))

    def get(self, url: str, **kwargs) -> dict:
        cached = self._load_entry(url)
        headers = kwargs.pop("headers", {})

        if cached:
            if cached.etag:
                headers["If-None-Match"] = cached.etag
            if cached.last_modified:
                headers["If-Modified-Since"] = cached.last_modified

        resp = self._client.get(url, headers=headers, **kwargs)

        if resp.status_code == 304 and cached:
            return cached.body

        resp.raise_for_status()
        body = resp.json()

        entry = CacheEntry(
            url=url,
            etag=resp.headers.get("ETag"),
            last_modified=resp.headers.get("Last-Modified"),
            body=body,
        )
        self._save_entry(entry)
        return body

This pattern sends conditional requests using ETags and Last-Modified. On a 304, it returns the cached body without parsing the response. The bandwidth savings can be substantial for large payloads.

Redis-backed shared cache

For distributed Python services, share cache across instances using Redis:

import requests_cache
from requests_cache.backends.redis import RedisCache

session = requests_cache.CachedSession(
    cache_name="shared_api_cache",
    backend=RedisCache(
        host="redis.internal",
        port=6379,
        db=0,
    ),
    expire_after=120,
    allowable_methods=["GET"],
)

Benefits of Redis over SQLite:

  • Shared across multiple service instances
  • Built-in TTL expiration (no cleanup jobs)
  • Atomic operations prevent race conditions
  • Can be monitored with standard Redis tooling

Cache-aware API client architecture

Integrate caching into the API client from the architecture section:

import httpx
import hishel
from typing import Any


class CachedAPIClient:
    def __init__(
        self,
        api_key: str,
        base_url: str,
        cache_dir: str = "/tmp/api_cache",
    ):
        storage = hishel.FileStorage(base_path=cache_dir)
        controller = hishel.Controller(
            cacheable_methods=["GET", "HEAD"],
            allow_stale=True,
        )
        transport = hishel.CacheTransport(
            transport=httpx.HTTPTransport(
                limits=httpx.Limits(
                    max_connections=50,
                    max_keepalive_connections=10,
                )
            ),
            storage=storage,
            controller=controller,
        )
        self._client = httpx.Client(
            transport=transport,
            base_url=base_url,
            headers={"Authorization": f"Bearer {api_key}"},
            timeout=30.0,
        )

    def get(self, path: str, **kwargs: Any) -> dict:
        resp = self._client.get(path, **kwargs)
        resp.raise_for_status()
        return resp.json()

    def post(self, path: str, **kwargs: Any) -> dict:
        # POST requests bypass cache automatically
        resp = self._client.post(path, **kwargs)
        resp.raise_for_status()
        return resp.json()

    def close(self) -> None:
        self._client.close()

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

GET requests are cached transparently. POST/PUT/DELETE bypass the cache. The allow_stale=True setting enables serving expired entries when the server is unavailable.

Vary header: caching per-request variations

The Vary header tells caches that responses differ based on specific request headers. For example, Vary: Accept-Language means the French and English responses should be cached separately.

This matters for API clients that send different Accept headers or authentication tokens. A caching layer that ignores Vary might serve a JSON response when XML was requested, or leak one user’s data to another.

Both hishel and requests-cache handle Vary correctly. Custom implementations must include request headers specified in Vary as part of the cache key.

Cache warming and prefetching

For latency-sensitive paths, pre-populate the cache during startup:

import httpx
from concurrent.futures import ThreadPoolExecutor


WARM_URLS = [
    "/config",
    "/feature-flags",
    "/products?category=popular",
]


def warm_cache(client: httpx.Client, urls: list[str]) -> None:
    def fetch(url: str) -> None:
        try:
            client.get(url)
        except httpx.HTTPError:
            pass  # Log but don't fail startup

    with ThreadPoolExecutor(max_workers=5) as pool:
        pool.map(fetch, urls)

Run this during application startup, before the service accepts traffic. The first real request to any warmed URL gets a cache hit instead of waiting for the upstream API.

Monitoring cache effectiveness

Track hit rates to validate your caching strategy:

from dataclasses import dataclass, field


@dataclass
class CacheMetrics:
    hits: int = 0
    misses: int = 0
    stale_served: int = 0
    errors_with_stale_fallback: int = 0

    @property
    def hit_rate(self) -> float:
        total = self.hits + self.misses
        return self.hits / total if total > 0 else 0.0

    def report(self) -> dict:
        return {
            "hits": self.hits,
            "misses": self.misses,
            "hit_rate": f"{self.hit_rate:.1%}",
            "stale_served": self.stale_served,
        }

A healthy cache should maintain a hit rate above 60% for API clients. Below 30% suggests either the TTLs are too short, the request patterns are too varied, or the cached endpoints change frequently.

MetricHealthyInvestigateCritical
Hit rate> 60%30-60%< 30%
Stale served< 5%5-15%> 15%
Cache size growthStableLinearUnbounded

The one thing to remember: Effective HTTP caching in Python combines spec-compliant libraries (hishel or requests-cache), proper Cache-Control/ETag respect, per-URL TTL policies, and monitoring — turning your API client from a bandwidth consumer into an intelligent, resilient data accessor.

pythonwebperformance

See Also

  • Python Aiohttp Client Understand Aiohttp Client through a practical analogy so your Python decisions become faster and clearer.
  • Python Api Client Design Why building your own API client in Python is like creating a TV remote that only has the buttons you actually need.
  • Python Api Documentation Swagger Swagger turns your Python API into an interactive playground where anyone can click buttons to try it out — no coding required.
  • Python Api Mocking Responses Why testing with fake API responses is like rehearsing a play with stand-ins before the real actors show up.
  • Python Api Pagination Clients Why APIs send data in pages, and how Python handles it — like reading a book one chapter at a time instead of swallowing the whole thing.