HTTP Caching Strategies — Core Concepts

Master Cache-Control, ETags, and conditional requests to slash API latency and bandwidth in Python HTTP clients.

Why HTTP caching matters for API clients

Every API call has a cost: network latency, server processing, rate limit consumption, and bandwidth. Many API responses don’t change between calls — a user’s profile, a product catalog, configuration data. Caching these responses locally avoids redundant work on both sides.

For Python services that call external APIs thousands of times per hour, caching can reduce latency by 90% and cut rate limit usage by 50-80%.

Cache-Control: the server’s caching instructions

The Cache-Control response header tells clients how to cache:

max-age=300 — response is valid for 300 seconds; don’t ask again until then
no-cache — you may store it, but must revalidate before each use
no-store — never store this response (sensitive data)
private — only the end client may cache (not shared proxies)
public — any cache (including CDNs) may store this

When max-age is present, your client can serve the response from local storage without any network call. This is the fastest form of caching.

ETags: fingerprints for responses

An ETag is a unique identifier for a specific version of a response, like a fingerprint. The server sends it with the response:

HTTP/1.1 200 OK
ETag: "abc123def456"

On the next request, your client sends the ETag back:

GET /users/42
If-None-Match: "abc123def456"

If nothing changed, the server responds with 304 Not Modified — no body, minimal data. Your client uses the cached version. If the data changed, the server sends the full new response with a new ETag.

Last-Modified: timestamp-based validation

Similar to ETags but uses timestamps. The server sends Last-Modified: Wed, 15 Jan 2025 10:00:00 GMT, and the client sends If-Modified-Since on the next request. The 304 mechanism works the same way.

ETags are more reliable than timestamps (they detect any change, not just time-based ones), but timestamps are simpler and supported by more servers.

Caching strategies for Python clients

Strategy 1: In-memory cache — fastest, but lost when the process restarts. Good for short-lived data in web servers.

Strategy 2: File-based cache — survives restarts, good for CLI tools and scripts. Libraries like requests-cache use SQLite by default.

Strategy 3: Redis/Memcached cache — shared across multiple processes and servers. Best for distributed services.

Strategy 4: Conditional requests only — no local storage, just send ETags/timestamps. Saves bandwidth but still requires a network round-trip. Useful when data changes frequently but responses are large.

The cache invalidation problem

The famous quote: “There are only two hard things in computer science: cache invalidation and naming things.” When cached data becomes stale but the cache doesn’t know it, users see outdated information.

Mitigation approaches:

Short TTLs — cache for seconds, not hours, for frequently changing data
Event-driven invalidation — when data changes, explicitly delete the cache entry
Stale-while-revalidate — serve the cached version immediately while fetching a fresh copy in the background

What to cache and what not to

Good candidates: API responses with Cache-Control headers, reference data (country lists, config), paginated list pages, user profiles in read-heavy apps.

Bad candidates: real-time data (stock prices, chat messages), POST/PUT/DELETE responses, authentication tokens (use dedicated token storage), personalized data that varies per user.

Common misconception

Developers often implement their own caching with dictionaries, ignoring HTTP cache headers entirely. This misses the server’s guidance on cache duration and freshness. The HTTP caching protocol already solves most caching problems — use a library that respects these headers instead of reinventing cache logic from scratch.

The one thing to remember: HTTP caching combines time-based freshness (Cache-Control) with validation-based freshness (ETags) — respect both, and your Python client becomes faster while placing less load on the APIs it calls.

pythonwebperformance