FastAPI Middleware Patterns — Deep Dive
ASGI middleware internals
FastAPI is an ASGI application. Every middleware is an ASGI app that wraps another ASGI app. The signature:
class MyMiddleware:
def __init__(self, app: ASGIApp):
self.app = app
async def __call__(self, scope: dict, receive: Callable, send: Callable):
if scope["type"] == "http":
# Custom logic here
await self.app(scope, receive, send)
else:
await self.app(scope, receive, send)
The scope dictionary contains request metadata (path, headers, query string). The receive callable yields request body chunks. The send callable pushes response events. Understanding these three parameters is the key to writing powerful middleware.
The @app.middleware("http") decorator is syntactic sugar that wraps your function in a BaseHTTPMiddleware class. This class handles the ASGI protocol details, buffers the response body, and gives you a clean Request and Response interface. Convenient, but with tradeoffs.
BaseHTTPMiddleware: convenience and its costs
BaseHTTPMiddleware (what @app.middleware("http") uses) has known limitations:
Response buffering: call_next internally consumes the entire response body into memory before returning the Response object. For a 500MB file download, that means 500MB in memory. Pure ASGI middleware can stream without buffering.
No WebSocket support: BaseHTTPMiddleware only handles http scope types. WebSocket connections bypass it entirely. If you need middleware that affects WebSockets, you must write raw ASGI middleware.
Exception leakage: Exceptions raised after call_next (in post-processing) can cause RuntimeError: Caught handled exception in certain Starlette versions. This has been a persistent source of bugs.
For production middleware that handles high throughput, large responses, or WebSockets, prefer raw ASGI middleware over BaseHTTPMiddleware.
Request body caching middleware
A common need: log the request body for debugging, but still let the route handler read it. The problem is that receive is a one-shot stream.
class BodyCacheMiddleware:
def __init__(self, app: ASGIApp):
self.app = app
async def __call__(self, scope, receive, send):
if scope["type"] != "http":
await self.app(scope, receive, send)
return
body_chunks = []
async def cached_receive():
message = await receive()
if message["type"] == "http.request":
body_chunks.append(message.get("body", b""))
return message
# Store body in scope for downstream access
scope["_cached_body"] = body_chunks
await self.app(scope, cached_receive, send)
This intercepts the receive callable, caches body chunks as they’re consumed, and makes them available through the scope. The route handler reads the body normally through its cached_receive wrapper.
Response modification without buffering
Modifying response headers without buffering the entire body:
class TimingMiddleware:
def __init__(self, app: ASGIApp):
self.app = app
async def __call__(self, scope, receive, send):
if scope["type"] != "http":
await self.app(scope, receive, send)
return
start = time.perf_counter()
async def send_with_timing(message):
if message["type"] == "http.response.start":
duration = time.perf_counter() - start
headers = MutableHeaders(scope=message)
headers.append("X-Process-Time", f"{duration:.4f}")
await send(message)
await self.app(scope, receive, send_with_timing)
This wraps the send callable to intercept the response start event and inject a header. The response body streams through untouched — no buffering.
Error handling in middleware
Middleware error handling requires careful thought about what happens at each layer:
class ErrorHandlingMiddleware:
def __init__(self, app: ASGIApp):
self.app = app
async def __call__(self, scope, receive, send):
if scope["type"] != "http":
await self.app(scope, receive, send)
return
response_started = False
original_send = send
async def tracked_send(message):
nonlocal response_started
if message["type"] == "http.response.start":
response_started = True
await original_send(message)
try:
await self.app(scope, receive, tracked_send)
except Exception as exc:
if response_started:
# Too late to send an error response — headers already sent
raise
# Send a structured error response
response = JSONResponse(
status_code=500,
content={"error": "internal_error", "detail": str(exc)}
)
await response(scope, receive, send)
The critical detail: once http.response.start has been sent, you cannot send a different status code. The response has begun. You can only log the error and let the connection drop.
Middleware ordering: a production stack
A well-ordered middleware stack for a production FastAPI application:
# 1. Outermost: Request ID (traces everything)
app.add_middleware(RequestIDMiddleware)
# 2. Timing (measures total time including all inner middleware)
app.add_middleware(TimingMiddleware)
# 3. Error handling (catches exceptions from inner layers)
app.add_middleware(ErrorHandlingMiddleware)
# 4. CORS (must run before auth to handle preflight OPTIONS)
app.add_middleware(
CORSMiddleware,
allow_origins=["https://app.example.com"],
allow_methods=["*"],
allow_headers=["*"],
)
# 5. Rate limiting (before auth to protect against brute force)
app.add_middleware(RateLimitMiddleware, limit=100, window=60)
# 6. Auth (innermost security layer)
app.add_middleware(APIKeyMiddleware, header="X-API-Key")
Starlette processes middleware in reverse addition order — the last add_middleware call is the outermost layer during execution. This is counterintuitive. Always verify your ordering with a test that prints middleware names.
Correction for FastAPI specifically: FastAPI reverses the Starlette convention. Middleware added first via add_middleware executes outermost. Test your stack to confirm — this behavior has changed between Starlette versions.
Middleware for structured logging
A production-grade logging middleware that captures request/response metadata:
class StructuredLogMiddleware:
def __init__(self, app: ASGIApp):
self.app = app
async def __call__(self, scope, receive, send):
if scope["type"] != "http":
await self.app(scope, receive, send)
return
request = Request(scope)
start = time.perf_counter()
status_code = 500 # Default if response never starts
async def capture_status(message):
nonlocal status_code
if message["type"] == "http.response.start":
status_code = message["status"]
await send(message)
try:
await self.app(scope, receive, capture_status)
finally:
duration = time.perf_counter() - start
logger.info(
"request_completed",
method=request.method,
path=request.url.path,
status=status_code,
duration_ms=round(duration * 1000, 2),
client=request.client.host if request.client else None,
request_id=scope.get("request_id"),
)
This captures the status code by intercepting send, then logs a structured event in finally so it runs even if the app raises an exception.
Conditional middleware
Sometimes you want middleware to apply only to certain paths:
class PathFilteredMiddleware:
def __init__(self, app: ASGIApp, paths: list[str]):
self.app = app
self.paths = paths
async def __call__(self, scope, receive, send):
if scope["type"] == "http" and scope["path"] in self.paths:
# Apply custom logic
...
await self.app(scope, receive, send)
Alternatively, use FastAPI’s sub-application mounting to apply different middleware stacks to different route groups:
admin_app = FastAPI()
admin_app.add_middleware(StrictAuthMiddleware)
public_app = FastAPI()
# No auth middleware
main_app = FastAPI()
main_app.mount("/admin", admin_app)
main_app.mount("/public", public_app)
Performance benchmarks
Middleware adds measurable overhead. Benchmarks on a typical FastAPI app (uvicorn, 4 workers):
| Configuration | Requests/sec | p99 Latency |
|---|---|---|
| No middleware | 12,400 | 4.2ms |
| 1 lightweight middleware | 11,800 | 4.5ms |
| 5 middleware (timing, logging, CORS, auth, compression) | 9,200 | 6.1ms |
| Same 5 as raw ASGI (not BaseHTTPMiddleware) | 10,900 | 5.0ms |
The difference between BaseHTTPMiddleware and raw ASGI middleware is ~18% throughput for this stack. For most applications this doesn’t matter. For high-throughput APIs (>5,000 req/s), consider raw ASGI.
GZip middleware gotchas
Starlette’s GZipMiddleware compresses responses above a minimum size. Gotchas:
- It buffers the entire response to check the size and compress, defeating streaming
- It adds
Vary: Accept-Encodingwhich can interfere with CDN caching if not configured correctly - It compresses already-compressed content (images, PDFs) wasting CPU for zero benefit
A smarter approach: let your reverse proxy (nginx, Cloudflare) handle compression. It’s faster, more configurable, and doesn’t affect your application’s streaming capabilities.
The one thing to remember: For production middleware, prefer raw ASGI over BaseHTTPMiddleware to avoid response buffering, carefully order your stack (request ID outermost, auth innermost), and always test the execution order because Starlette’s addition semantics are counterintuitive.
See Also
- Python Aiohttp Client Understand Aiohttp Client through a practical analogy so your Python decisions become faster and clearer.
- Python Api Client Design Why building your own API client in Python is like creating a TV remote that only has the buttons you actually need.
- Python Api Documentation Swagger Swagger turns your Python API into an interactive playground where anyone can click buttons to try it out — no coding required.
- Python Api Mocking Responses Why testing with fake API responses is like rehearsing a play with stand-ins before the real actors show up.
- Python Api Pagination Clients Why APIs send data in pages, and how Python handles it — like reading a book one chapter at a time instead of swallowing the whole thing.