FastAPI Middleware Patterns — Deep Dive

Advanced middleware implementation in FastAPI: ASGI internals, streaming, body caching, error propagation, and production middleware stacks.

ASGI middleware internals

FastAPI is an ASGI application. Every middleware is an ASGI app that wraps another ASGI app. The signature:

class MyMiddleware:
    def __init__(self, app: ASGIApp):
        self.app = app

    async def __call__(self, scope: dict, receive: Callable, send: Callable):
        if scope["type"] == "http":
            # Custom logic here
            await self.app(scope, receive, send)
        else:
            await self.app(scope, receive, send)

The scope dictionary contains request metadata (path, headers, query string). The receive callable yields request body chunks. The send callable pushes response events. Understanding these three parameters is the key to writing powerful middleware.

The @app.middleware("http") decorator is syntactic sugar that wraps your function in a BaseHTTPMiddleware class. This class handles the ASGI protocol details, buffers the response body, and gives you a clean Request and Response interface. Convenient, but with tradeoffs.

BaseHTTPMiddleware: convenience and its costs

BaseHTTPMiddleware (what @app.middleware("http") uses) has known limitations:

Response buffering: call_next internally consumes the entire response body into memory before returning the Response object. For a 500MB file download, that means 500MB in memory. Pure ASGI middleware can stream without buffering.

No WebSocket support: BaseHTTPMiddleware only handles http scope types. WebSocket connections bypass it entirely. If you need middleware that affects WebSockets, you must write raw ASGI middleware.

Exception leakage: Exceptions raised after call_next (in post-processing) can cause RuntimeError: Caught handled exception in certain Starlette versions. This has been a persistent source of bugs.

For production middleware that handles high throughput, large responses, or WebSockets, prefer raw ASGI middleware over BaseHTTPMiddleware.

Request body caching middleware

A common need: log the request body for debugging, but still let the route handler read it. The problem is that receive is a one-shot stream.

class BodyCacheMiddleware:
    def __init__(self, app: ASGIApp):
        self.app = app

    async def __call__(self, scope, receive, send):
        if scope["type"] != "http":
            await self.app(scope, receive, send)
            return

        body_chunks = []
        async def cached_receive():
            message = await receive()
            if message["type"] == "http.request":
                body_chunks.append(message.get("body", b""))
            return message

        # Store body in scope for downstream access
        scope["_cached_body"] = body_chunks
        await self.app(scope, cached_receive, send)

This intercepts the receive callable, caches body chunks as they’re consumed, and makes them available through the scope. The route handler reads the body normally through its cached_receive wrapper.

Response modification without buffering

Modifying response headers without buffering the entire body:

class TimingMiddleware:
    def __init__(self, app: ASGIApp):
        self.app = app

    async def __call__(self, scope, receive, send):
        if scope["type"] != "http":
            await self.app(scope, receive, send)
            return

        start = time.perf_counter()

        async def send_with_timing(message):
            if message["type"] == "http.response.start":
                duration = time.perf_counter() - start
                headers = MutableHeaders(scope=message)
                headers.append("X-Process-Time", f"{duration:.4f}")
            await send(message)

        await self.app(scope, receive, send_with_timing)

This wraps the send callable to intercept the response start event and inject a header. The response body streams through untouched — no buffering.

Error handling in middleware

Middleware error handling requires careful thought about what happens at each layer:

class ErrorHandlingMiddleware:
    def __init__(self, app: ASGIApp):
        self.app = app

    async def __call__(self, scope, receive, send):
        if scope["type"] != "http":
            await self.app(scope, receive, send)
            return

        response_started = False
        original_send = send

        async def tracked_send(message):
            nonlocal response_started
            if message["type"] == "http.response.start":
                response_started = True
            await original_send(message)

        try:
            await self.app(scope, receive, tracked_send)
        except Exception as exc:
            if response_started:
                # Too late to send an error response — headers already sent
                raise
            # Send a structured error response
            response = JSONResponse(
                status_code=500,
                content={"error": "internal_error", "detail": str(exc)}
            )
            await response(scope, receive, send)

The critical detail: once http.response.start has been sent, you cannot send a different status code. The response has begun. You can only log the error and let the connection drop.

Middleware ordering: a production stack

A well-ordered middleware stack for a production FastAPI application:

# 1. Outermost: Request ID (traces everything)
app.add_middleware(RequestIDMiddleware)

# 2. Timing (measures total time including all inner middleware)
app.add_middleware(TimingMiddleware)

# 3. Error handling (catches exceptions from inner layers)
app.add_middleware(ErrorHandlingMiddleware)

# 4. CORS (must run before auth to handle preflight OPTIONS)
app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://app.example.com"],
    allow_methods=["*"],
    allow_headers=["*"],
)

# 5. Rate limiting (before auth to protect against brute force)
app.add_middleware(RateLimitMiddleware, limit=100, window=60)

# 6. Auth (innermost security layer)
app.add_middleware(APIKeyMiddleware, header="X-API-Key")

Starlette processes middleware in reverse addition order — the last add_middleware call is the outermost layer during execution. This is counterintuitive. Always verify your ordering with a test that prints middleware names.

Correction for FastAPI specifically: FastAPI reverses the Starlette convention. Middleware added first via add_middleware executes outermost. Test your stack to confirm — this behavior has changed between Starlette versions.

Middleware for structured logging

A production-grade logging middleware that captures request/response metadata:

class StructuredLogMiddleware:
    def __init__(self, app: ASGIApp):
        self.app = app

    async def __call__(self, scope, receive, send):
        if scope["type"] != "http":
            await self.app(scope, receive, send)
            return

        request = Request(scope)
        start = time.perf_counter()
        status_code = 500  # Default if response never starts

        async def capture_status(message):
            nonlocal status_code
            if message["type"] == "http.response.start":
                status_code = message["status"]
            await send(message)

        try:
            await self.app(scope, receive, capture_status)
        finally:
            duration = time.perf_counter() - start
            logger.info(
                "request_completed",
                method=request.method,
                path=request.url.path,
                status=status_code,
                duration_ms=round(duration * 1000, 2),
                client=request.client.host if request.client else None,
                request_id=scope.get("request_id"),
            )

This captures the status code by intercepting send, then logs a structured event in finally so it runs even if the app raises an exception.

Conditional middleware

Sometimes you want middleware to apply only to certain paths:

class PathFilteredMiddleware:
    def __init__(self, app: ASGIApp, paths: list[str]):
        self.app = app
        self.paths = paths

    async def __call__(self, scope, receive, send):
        if scope["type"] == "http" and scope["path"] in self.paths:
            # Apply custom logic
            ...
        await self.app(scope, receive, send)

Alternatively, use FastAPI’s sub-application mounting to apply different middleware stacks to different route groups:

admin_app = FastAPI()
admin_app.add_middleware(StrictAuthMiddleware)

public_app = FastAPI()
# No auth middleware

main_app = FastAPI()
main_app.mount("/admin", admin_app)
main_app.mount("/public", public_app)

Performance benchmarks

Middleware adds measurable overhead. Benchmarks on a typical FastAPI app (uvicorn, 4 workers):

Configuration	Requests/sec	p99 Latency
No middleware	12,400	4.2ms
1 lightweight middleware	11,800	4.5ms
5 middleware (timing, logging, CORS, auth, compression)	9,200	6.1ms
Same 5 as raw ASGI (not BaseHTTPMiddleware)	10,900	5.0ms

The difference between BaseHTTPMiddleware and raw ASGI middleware is ~18% throughput for this stack. For most applications this doesn’t matter. For high-throughput APIs (>5,000 req/s), consider raw ASGI.

GZip middleware gotchas

Starlette’s GZipMiddleware compresses responses above a minimum size. Gotchas:

It buffers the entire response to check the size and compress, defeating streaming
It adds Vary: Accept-Encoding which can interfere with CDN caching if not configured correctly
It compresses already-compressed content (images, PDFs) wasting CPU for zero benefit

A smarter approach: let your reverse proxy (nginx, Cloudflare) handle compression. It’s faster, more configurable, and doesn’t affect your application’s streaming capabilities.

The one thing to remember: For production middleware, prefer raw ASGI over BaseHTTPMiddleware to avoid response buffering, carefully order your stack (request ID outermost, auth innermost), and always test the execution order because Starlette’s addition semantics are counterintuitive.

pythonwebapis