HTTP/2 Server Push — Deep Dive
HTTP/2 binary framing
HTTP/2 replaces HTTP/1.1’s text-based protocol with a binary framing layer. Every piece of communication is broken into frames, which belong to streams. A single TCP connection carries multiple concurrent streams.
Key frame types:
- HEADERS — contains request or response headers (compressed with HPACK)
- DATA — carries the response body
- PUSH_PROMISE — announces a server-initiated resource push
- SETTINGS — connection-level configuration (max concurrent streams, window size)
- WINDOW_UPDATE — flow control adjustments
- RST_STREAM — cancel a single stream without closing the connection
Understanding this matters for Python developers because debugging HTTP/2 issues requires reading frame-level traces, not just HTTP headers.
Hypercorn: HTTP/2 in Python
Hypercorn is the primary Python ASGI server with native HTTP/2 support:
pip install hypercorn
hypercorn myapp:app --bind 0.0.0.0:8443 --certfile cert.pem --keyfile key.pem
HTTP/2 requires TLS in browsers (though the spec allows cleartext h2c). For local development:
# Generate self-signed cert
openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365 -nodes
# Run with HTTP/2
hypercorn myapp:app --bind 0.0.0.0:8443 --certfile cert.pem --keyfile key.pem
Hypercorn configuration for production:
# hypercorn_config.py
bind = "0.0.0.0:8443"
certfile = "/etc/ssl/certs/server.pem"
keyfile = "/etc/ssl/private/server-key.pem"
h2_max_concurrent_streams = 100
h2_max_header_list_size = 65536
h2_initial_window_size = 65535
workers = 4
Implementing server push with Hypercorn
Although browser support has waned, server push remains useful for API-to-API communication and specialized clients. Hypercorn exposes push through the ASGI http.response.push extension:
from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse
app = FastAPI()
@app.get("/", response_class=HTMLResponse)
async def index(request: Request):
# Check if server push is available
if "http.response.push" in request.scope.get("extensions", {}):
push = request.scope["extensions"]["http.response.push"]
# Push CSS before sending HTML
await push({
"path": "/static/style.css",
"headers": [
(b"accept", b"text/css"),
],
})
# Push JavaScript
await push({
"path": "/static/app.js",
"headers": [
(b"accept", b"application/javascript"),
],
})
return HTMLResponse(content="""
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="/static/style.css">
<script src="/static/app.js" defer></script>
</head>
<body><h1>Hello HTTP/2</h1></body>
</html>
""")
The push promise is sent as part of the response to the original request. The server then handles the pushed path as if it received a normal GET request, sending the response on a new stream.
PUSH_PROMISE frame internals
When the server pushes, it sends a PUSH_PROMISE frame on the original stream (stream 1 for the HTML request) that includes:
PUSH_PROMISE frame:
Stream ID: 1 (the original request's stream)
Promised Stream ID: 2 (even-numbered, server-initiated)
Header Block:
:method: GET
:path: /static/style.css
:scheme: https
:authority: example.com
Then the server sends HEADERS and DATA frames on stream 2 with the actual CSS content. The client can reject a push by sending RST_STREAM on the promised stream ID.
Why Chrome killed server push
Google’s analysis of server push in the wild revealed:
- 0.1% of HTTP/2 connections used push — adoption was minimal
- Over 50% of pushed resources were already cached — wasting bandwidth
- Push increased page load time in many cases — bandwidth contention with critical resources
- Cache digest proposals (RFC draft) stalled — no standardized way for clients to tell servers what they have cached
The fundamental problem: server push is an optimization that requires knowing what the client already has. Without that knowledge, you’re gambling. Cache digests (a Bloom filter of cached URLs sent by the client) were proposed but never standardized.
103 Early Hints: the practical replacement
103 Early Hints (RFC 8297) solves the discovery latency problem without pushing actual bytes:
from starlette.responses import Response
class EarlyHintsResponse(Response):
def __init__(self, hints: list[str], final_response: Response):
self.hints = hints
self.final_response = final_response
async def __call__(self, scope, receive, send):
# Send 103 Early Hints
await send({
"type": "http.response.start",
"status": 103,
"headers": [
(b"link", hint.encode()) for hint in self.hints
],
})
# Send final response
await self.final_response(scope, receive, send)
In practice, implementing 103 directly in Python is tricky because most ASGI servers don’t support informational responses yet. The pragmatic approach is to configure it at the reverse proxy level:
# Nginx configuration for 103 Early Hints
server {
listen 443 ssl http2;
location / {
add_header Link "</style.css>; rel=preload; as=style" early;
add_header Link "</app.js>; rel=preload; as=script" early;
proxy_pass http://localhost:8000;
}
}
Caddy supports it natively:
example.com {
header Link "</style.css>; rel=preload; as=style"
reverse_proxy localhost:8000
}
HTTP/2 connection management in Python clients
When consuming HTTP/2 APIs from Python, httpx provides transparent HTTP/2 support:
import httpx
# HTTP/2 client
async with httpx.AsyncClient(http2=True) as client:
# All requests to the same host share one connection
responses = await asyncio.gather(
client.get("https://api.example.com/users"),
client.get("https://api.example.com/orders"),
client.get("https://api.example.com/products"),
)
The key advantage: three concurrent requests over a single TCP connection. With HTTP/1.1, this would require three separate connections. For API aggregation services making dozens of requests to the same backend, HTTP/2 reduces connection overhead dramatically.
Install with HTTP/2 support:
pip install httpx[http2]
Stream prioritization
HTTP/2 allows clients to assign priorities to streams. In a web page context, CSS should load before images. The priority tree is complex:
# Using hyper-h2 library for low-level HTTP/2
import h2.connection
import h2.config
import h2.events
config = h2.config.H2Configuration(client_side=True)
conn = h2.connection.H2Connection(config=config)
conn.initiate_connection()
# High-priority request for CSS (weight 256, highest)
conn.send_headers(
stream_id=1,
headers=[
(":method", "GET"),
(":path", "/style.css"),
(":scheme", "https"),
(":authority", "example.com"),
],
priority_weight=256,
)
# Lower-priority request for an image (weight 16)
conn.send_headers(
stream_id=3,
headers=[
(":method", "GET"),
(":path", "/hero.jpg"),
(":scheme", "https"),
(":authority", "example.com"),
],
priority_weight=16,
priority_depends_on=1, # depends on CSS stream
)
Note: HTTP/3 (QUIC) replaced this complex priority tree with a simpler urgency-based system using the Priority header.
Performance measurement
Measure HTTP/2 benefits with real timing data:
import httpx
import time
import asyncio
async def benchmark_http2(url: str, num_requests: int = 20):
# HTTP/1.1
start = time.perf_counter()
async with httpx.AsyncClient(http1=True, http2=False) as client:
await asyncio.gather(*[client.get(url) for _ in range(num_requests)])
h1_time = time.perf_counter() - start
# HTTP/2
start = time.perf_counter()
async with httpx.AsyncClient(http2=True) as client:
await asyncio.gather(*[client.get(url) for _ in range(num_requests)])
h2_time = time.perf_counter() - start
print(f"HTTP/1.1: {h1_time:.3f}s")
print(f"HTTP/2: {h2_time:.3f}s")
print(f"Speedup: {h1_time/h2_time:.1f}x")
Typical results for 20 concurrent requests to the same host:
- HTTP/1.1: ~2.1s (limited by 6 connections)
- HTTP/2: ~0.8s (single multiplexed connection)
The speedup is most dramatic when requests are small (API calls) and the connection setup cost is proportionally high. For large file downloads, the difference narrows because bandwidth is the bottleneck, not connection overhead.
HTTP/3 and QUIC: what’s next
HTTP/3 replaces TCP with QUIC (UDP-based). Key improvements over HTTP/2:
- No head-of-line blocking. In HTTP/2, a single lost TCP packet blocks all streams. QUIC streams are independent — a lost packet on stream 3 doesn’t affect stream 5.
- Faster connection setup. QUIC combines the transport and TLS handshakes into one round trip (0-RTT for repeat connections).
- Connection migration. Connections survive network changes (WiFi to cellular) because they’re identified by connection IDs, not IP addresses.
Python support for HTTP/3 is emerging through aioquic and httpx experimental backends, but production deployment typically uses Caddy or Nginx as the QUIC termination point.
One thing to remember: HTTP/2 multiplexing remains the biggest practical win for Python applications — server push was a failed experiment, but the underlying protocol improvements are substantial, and the future lies in HTTP/3’s QUIC-based transport.
See Also
- Python Aiohttp Client Understand Aiohttp Client through a practical analogy so your Python decisions become faster and clearer.
- Python Api Client Design Why building your own API client in Python is like creating a TV remote that only has the buttons you actually need.
- Python Api Documentation Swagger Swagger turns your Python API into an interactive playground where anyone can click buttons to try it out — no coding required.
- Python Api Mocking Responses Why testing with fake API responses is like rehearsing a play with stand-ins before the real actors show up.
- Python Api Pagination Clients Why APIs send data in pages, and how Python handles it — like reading a book one chapter at a time instead of swallowing the whole thing.