HTTP Caching Strategies — Deep Dive
HTTP caching in Python: library landscape
Python has several libraries for HTTP caching, each targeting different use cases. The choice depends on your HTTP client and storage requirements.
requests-cache: drop-in caching for requests
requests-cache patches requests.Session to add transparent caching with multiple backends:
import requests_cache
session = requests_cache.CachedSession(
cache_name="api_cache",
backend="sqlite", # Also: redis, mongodb, filesystem
expire_after=300, # Default TTL: 5 minutes
allowable_methods=["GET", "HEAD"],
stale_if_error=True, # Serve stale on errors
urls_expire_after={
"api.example.com/config": 3600, # Config: 1 hour
"api.example.com/users/*": 60, # Users: 1 minute
"api.example.com/realtime/*": -1, # Never cache
},
)
# Usage is identical to requests.Session
resp = session.get("https://api.example.com/config")
print(resp.from_cache) # True on subsequent calls
The urls_expire_after dictionary lets you set different TTLs per URL pattern. A value of -1 disables caching for that pattern. stale_if_error=True returns expired cache entries when the server is unreachable — a crucial resilience feature.
hishel: HTTP caching for httpx
For httpx users, hishel provides RFC 9111-compliant HTTP caching:
import hishel
import httpx
storage = hishel.FileStorage(base_path="/tmp/http_cache")
controller = hishel.Controller(
cacheable_methods=["GET", "HEAD"],
cacheable_status_codes=[200, 203, 300, 301],
)
transport = hishel.CacheTransport(
transport=httpx.HTTPTransport(),
storage=storage,
controller=controller,
)
client = httpx.Client(transport=transport)
resp = client.get("https://api.example.com/data")
Hishel respects the full HTTP caching specification: Cache-Control, Vary, ETag, Last-Modified, and conditional requests. This is the correct approach for clients that need spec-compliant behavior.
For async:
import hishel
import httpx
async_storage = hishel.AsyncFileStorage(base_path="/tmp/http_cache")
async_transport = hishel.AsyncCacheTransport(
transport=httpx.AsyncHTTPTransport(),
storage=async_storage,
)
async_client = httpx.AsyncClient(transport=async_transport)
Implementing conditional requests manually
When you can’t use a caching library, implement conditional requests directly:
import httpx
import json
from pathlib import Path
from dataclasses import dataclass, field
@dataclass
class CacheEntry:
url: str
etag: str | None = None
last_modified: str | None = None
body: dict = field(default_factory=dict)
class ConditionalClient:
def __init__(self, client: httpx.Client, cache_dir: str = "/tmp/cache"):
self._client = client
self._cache_dir = Path(cache_dir)
self._cache_dir.mkdir(parents=True, exist_ok=True)
def _cache_path(self, url: str) -> Path:
import hashlib
key = hashlib.sha256(url.encode()).hexdigest()[:16]
return self._cache_dir / f"{key}.json"
def _load_entry(self, url: str) -> CacheEntry | None:
path = self._cache_path(url)
if path.exists():
data = json.loads(path.read_text())
return CacheEntry(**data)
return None
def _save_entry(self, entry: CacheEntry) -> None:
path = self._cache_path(entry.url)
path.write_text(json.dumps({
"url": entry.url,
"etag": entry.etag,
"last_modified": entry.last_modified,
"body": entry.body,
}))
def get(self, url: str, **kwargs) -> dict:
cached = self._load_entry(url)
headers = kwargs.pop("headers", {})
if cached:
if cached.etag:
headers["If-None-Match"] = cached.etag
if cached.last_modified:
headers["If-Modified-Since"] = cached.last_modified
resp = self._client.get(url, headers=headers, **kwargs)
if resp.status_code == 304 and cached:
return cached.body
resp.raise_for_status()
body = resp.json()
entry = CacheEntry(
url=url,
etag=resp.headers.get("ETag"),
last_modified=resp.headers.get("Last-Modified"),
body=body,
)
self._save_entry(entry)
return body
This pattern sends conditional requests using ETags and Last-Modified. On a 304, it returns the cached body without parsing the response. The bandwidth savings can be substantial for large payloads.
Redis-backed shared cache
For distributed Python services, share cache across instances using Redis:
import requests_cache
from requests_cache.backends.redis import RedisCache
session = requests_cache.CachedSession(
cache_name="shared_api_cache",
backend=RedisCache(
host="redis.internal",
port=6379,
db=0,
),
expire_after=120,
allowable_methods=["GET"],
)
Benefits of Redis over SQLite:
- Shared across multiple service instances
- Built-in TTL expiration (no cleanup jobs)
- Atomic operations prevent race conditions
- Can be monitored with standard Redis tooling
Cache-aware API client architecture
Integrate caching into the API client from the architecture section:
import httpx
import hishel
from typing import Any
class CachedAPIClient:
def __init__(
self,
api_key: str,
base_url: str,
cache_dir: str = "/tmp/api_cache",
):
storage = hishel.FileStorage(base_path=cache_dir)
controller = hishel.Controller(
cacheable_methods=["GET", "HEAD"],
allow_stale=True,
)
transport = hishel.CacheTransport(
transport=httpx.HTTPTransport(
limits=httpx.Limits(
max_connections=50,
max_keepalive_connections=10,
)
),
storage=storage,
controller=controller,
)
self._client = httpx.Client(
transport=transport,
base_url=base_url,
headers={"Authorization": f"Bearer {api_key}"},
timeout=30.0,
)
def get(self, path: str, **kwargs: Any) -> dict:
resp = self._client.get(path, **kwargs)
resp.raise_for_status()
return resp.json()
def post(self, path: str, **kwargs: Any) -> dict:
# POST requests bypass cache automatically
resp = self._client.post(path, **kwargs)
resp.raise_for_status()
return resp.json()
def close(self) -> None:
self._client.close()
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
GET requests are cached transparently. POST/PUT/DELETE bypass the cache. The allow_stale=True setting enables serving expired entries when the server is unavailable.
Vary header: caching per-request variations
The Vary header tells caches that responses differ based on specific request headers. For example, Vary: Accept-Language means the French and English responses should be cached separately.
This matters for API clients that send different Accept headers or authentication tokens. A caching layer that ignores Vary might serve a JSON response when XML was requested, or leak one user’s data to another.
Both hishel and requests-cache handle Vary correctly. Custom implementations must include request headers specified in Vary as part of the cache key.
Cache warming and prefetching
For latency-sensitive paths, pre-populate the cache during startup:
import httpx
from concurrent.futures import ThreadPoolExecutor
WARM_URLS = [
"/config",
"/feature-flags",
"/products?category=popular",
]
def warm_cache(client: httpx.Client, urls: list[str]) -> None:
def fetch(url: str) -> None:
try:
client.get(url)
except httpx.HTTPError:
pass # Log but don't fail startup
with ThreadPoolExecutor(max_workers=5) as pool:
pool.map(fetch, urls)
Run this during application startup, before the service accepts traffic. The first real request to any warmed URL gets a cache hit instead of waiting for the upstream API.
Monitoring cache effectiveness
Track hit rates to validate your caching strategy:
from dataclasses import dataclass, field
@dataclass
class CacheMetrics:
hits: int = 0
misses: int = 0
stale_served: int = 0
errors_with_stale_fallback: int = 0
@property
def hit_rate(self) -> float:
total = self.hits + self.misses
return self.hits / total if total > 0 else 0.0
def report(self) -> dict:
return {
"hits": self.hits,
"misses": self.misses,
"hit_rate": f"{self.hit_rate:.1%}",
"stale_served": self.stale_served,
}
A healthy cache should maintain a hit rate above 60% for API clients. Below 30% suggests either the TTLs are too short, the request patterns are too varied, or the cached endpoints change frequently.
| Metric | Healthy | Investigate | Critical |
|---|---|---|---|
| Hit rate | > 60% | 30-60% | < 30% |
| Stale served | < 5% | 5-15% | > 15% |
| Cache size growth | Stable | Linear | Unbounded |
The one thing to remember: Effective HTTP caching in Python combines spec-compliant libraries (hishel or requests-cache), proper Cache-Control/ETag respect, per-URL TTL policies, and monitoring — turning your API client from a bandwidth consumer into an intelligent, resilient data accessor.
See Also
- Python Aiohttp Client Understand Aiohttp Client through a practical analogy so your Python decisions become faster and clearer.
- Python Api Client Design Why building your own API client in Python is like creating a TV remote that only has the buttons you actually need.
- Python Api Documentation Swagger Swagger turns your Python API into an interactive playground where anyone can click buttons to try it out — no coding required.
- Python Api Mocking Responses Why testing with fake API responses is like rehearsing a play with stand-ins before the real actors show up.
- Python Api Pagination Clients Why APIs send data in pages, and how Python handles it — like reading a book one chapter at a time instead of swallowing the whole thing.