HTTP/2 Server Push — Deep Dive

Implement HTTP/2 with Python using Hypercorn, understand PUSH_PROMISE internals, and build 103 Early Hints into your ASGI apps.

HTTP/2 binary framing

HTTP/2 replaces HTTP/1.1’s text-based protocol with a binary framing layer. Every piece of communication is broken into frames, which belong to streams. A single TCP connection carries multiple concurrent streams.

Key frame types:

HEADERS — contains request or response headers (compressed with HPACK)
DATA — carries the response body
PUSH_PROMISE — announces a server-initiated resource push
SETTINGS — connection-level configuration (max concurrent streams, window size)
WINDOW_UPDATE — flow control adjustments
RST_STREAM — cancel a single stream without closing the connection

Understanding this matters for Python developers because debugging HTTP/2 issues requires reading frame-level traces, not just HTTP headers.

Hypercorn: HTTP/2 in Python

Hypercorn is the primary Python ASGI server with native HTTP/2 support:

pip install hypercorn
hypercorn myapp:app --bind 0.0.0.0:8443 --certfile cert.pem --keyfile key.pem

HTTP/2 requires TLS in browsers (though the spec allows cleartext h2c). For local development:

# Generate self-signed cert
openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365 -nodes

# Run with HTTP/2
hypercorn myapp:app --bind 0.0.0.0:8443 --certfile cert.pem --keyfile key.pem

Hypercorn configuration for production:

# hypercorn_config.py
bind = "0.0.0.0:8443"
certfile = "/etc/ssl/certs/server.pem"
keyfile = "/etc/ssl/private/server-key.pem"
h2_max_concurrent_streams = 100
h2_max_header_list_size = 65536
h2_initial_window_size = 65535
workers = 4

Implementing server push with Hypercorn

Although browser support has waned, server push remains useful for API-to-API communication and specialized clients. Hypercorn exposes push through the ASGI http.response.push extension:

from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse

app = FastAPI()

@app.get("/", response_class=HTMLResponse)
async def index(request: Request):
    # Check if server push is available
    if "http.response.push" in request.scope.get("extensions", {}):
        push = request.scope["extensions"]["http.response.push"]

        # Push CSS before sending HTML
        await push({
            "path": "/static/style.css",
            "headers": [
                (b"accept", b"text/css"),
            ],
        })

        # Push JavaScript
        await push({
            "path": "/static/app.js",
            "headers": [
                (b"accept", b"application/javascript"),
            ],
        })

    return HTMLResponse(content="""
    <!DOCTYPE html>
    <html>
    <head>
        <link rel="stylesheet" href="/static/style.css">
        <script src="/static/app.js" defer></script>
    </head>
    <body><h1>Hello HTTP/2</h1></body>
    </html>
    """)

The push promise is sent as part of the response to the original request. The server then handles the pushed path as if it received a normal GET request, sending the response on a new stream.

PUSH_PROMISE frame internals

When the server pushes, it sends a PUSH_PROMISE frame on the original stream (stream 1 for the HTML request) that includes:

PUSH_PROMISE frame:
  Stream ID: 1 (the original request's stream)
  Promised Stream ID: 2 (even-numbered, server-initiated)
  Header Block:
    :method: GET
    :path: /static/style.css
    :scheme: https
    :authority: example.com

Then the server sends HEADERS and DATA frames on stream 2 with the actual CSS content. The client can reject a push by sending RST_STREAM on the promised stream ID.

Why Chrome killed server push

Google’s analysis of server push in the wild revealed:

0.1% of HTTP/2 connections used push — adoption was minimal
Over 50% of pushed resources were already cached — wasting bandwidth
Push increased page load time in many cases — bandwidth contention with critical resources
Cache digest proposals (RFC draft) stalled — no standardized way for clients to tell servers what they have cached

The fundamental problem: server push is an optimization that requires knowing what the client already has. Without that knowledge, you’re gambling. Cache digests (a Bloom filter of cached URLs sent by the client) were proposed but never standardized.

103 Early Hints: the practical replacement

103 Early Hints (RFC 8297) solves the discovery latency problem without pushing actual bytes:

from starlette.responses import Response

class EarlyHintsResponse(Response):
    def __init__(self, hints: list[str], final_response: Response):
        self.hints = hints
        self.final_response = final_response

    async def __call__(self, scope, receive, send):
        # Send 103 Early Hints
        await send({
            "type": "http.response.start",
            "status": 103,
            "headers": [
                (b"link", hint.encode()) for hint in self.hints
            ],
        })

        # Send final response
        await self.final_response(scope, receive, send)

In practice, implementing 103 directly in Python is tricky because most ASGI servers don’t support informational responses yet. The pragmatic approach is to configure it at the reverse proxy level:

# Nginx configuration for 103 Early Hints
server {
    listen 443 ssl http2;

    location / {
        add_header Link "</style.css>; rel=preload; as=style" early;
        add_header Link "</app.js>; rel=preload; as=script" early;
        proxy_pass http://localhost:8000;
    }
}

Caddy supports it natively:

example.com {
    header Link "</style.css>; rel=preload; as=style"
    reverse_proxy localhost:8000
}

HTTP/2 connection management in Python clients

When consuming HTTP/2 APIs from Python, httpx provides transparent HTTP/2 support:

import httpx

# HTTP/2 client
async with httpx.AsyncClient(http2=True) as client:
    # All requests to the same host share one connection
    responses = await asyncio.gather(
        client.get("https://api.example.com/users"),
        client.get("https://api.example.com/orders"),
        client.get("https://api.example.com/products"),
    )

The key advantage: three concurrent requests over a single TCP connection. With HTTP/1.1, this would require three separate connections. For API aggregation services making dozens of requests to the same backend, HTTP/2 reduces connection overhead dramatically.

Install with HTTP/2 support:

pip install httpx[http2]

Stream prioritization

HTTP/2 allows clients to assign priorities to streams. In a web page context, CSS should load before images. The priority tree is complex:

# Using hyper-h2 library for low-level HTTP/2
import h2.connection
import h2.config
import h2.events

config = h2.config.H2Configuration(client_side=True)
conn = h2.connection.H2Connection(config=config)
conn.initiate_connection()

# High-priority request for CSS (weight 256, highest)
conn.send_headers(
    stream_id=1,
    headers=[
        (":method", "GET"),
        (":path", "/style.css"),
        (":scheme", "https"),
        (":authority", "example.com"),
    ],
    priority_weight=256,
)

# Lower-priority request for an image (weight 16)
conn.send_headers(
    stream_id=3,
    headers=[
        (":method", "GET"),
        (":path", "/hero.jpg"),
        (":scheme", "https"),
        (":authority", "example.com"),
    ],
    priority_weight=16,
    priority_depends_on=1,  # depends on CSS stream
)

Note: HTTP/3 (QUIC) replaced this complex priority tree with a simpler urgency-based system using the Priority header.

Performance measurement

Measure HTTP/2 benefits with real timing data:

import httpx
import time
import asyncio

async def benchmark_http2(url: str, num_requests: int = 20):
    # HTTP/1.1
    start = time.perf_counter()
    async with httpx.AsyncClient(http1=True, http2=False) as client:
        await asyncio.gather(*[client.get(url) for _ in range(num_requests)])
    h1_time = time.perf_counter() - start

    # HTTP/2
    start = time.perf_counter()
    async with httpx.AsyncClient(http2=True) as client:
        await asyncio.gather(*[client.get(url) for _ in range(num_requests)])
    h2_time = time.perf_counter() - start

    print(f"HTTP/1.1: {h1_time:.3f}s")
    print(f"HTTP/2:   {h2_time:.3f}s")
    print(f"Speedup:  {h1_time/h2_time:.1f}x")

Typical results for 20 concurrent requests to the same host:

HTTP/1.1: ~2.1s (limited by 6 connections)
HTTP/2: ~0.8s (single multiplexed connection)

The speedup is most dramatic when requests are small (API calls) and the connection setup cost is proportionally high. For large file downloads, the difference narrows because bandwidth is the bottleneck, not connection overhead.

HTTP/3 and QUIC: what’s next

HTTP/3 replaces TCP with QUIC (UDP-based). Key improvements over HTTP/2:

No head-of-line blocking. In HTTP/2, a single lost TCP packet blocks all streams. QUIC streams are independent — a lost packet on stream 3 doesn’t affect stream 5.
Faster connection setup. QUIC combines the transport and TLS handshakes into one round trip (0-RTT for repeat connections).
Connection migration. Connections survive network changes (WiFi to cellular) because they’re identified by connection IDs, not IP addresses.

Python support for HTTP/3 is emerging through aioquic and httpx experimental backends, but production deployment typically uses Caddy or Nginx as the QUIC termination point.

One thing to remember: HTTP/2 multiplexing remains the biggest practical win for Python applications — server push was a failed experiment, but the underlying protocol improvements are substantial, and the future lies in HTTP/3’s QUIC-based transport.

pythonwebhttpperformanceasgi