API Gateway Patterns in Python — Deep Dive

Build production-grade Python API gateways with async proxying, circuit breakers, BFF patterns, and observability hooks.

Technical perspective

An API gateway sits at the most critical point in a distributed system — the boundary between untrusted external traffic and internal services. Getting the implementation wrong causes cascading failures, security holes, or latency spikes that ripple through every consumer. Python’s async capabilities, rich middleware ecosystem, and rapid prototyping speed make it a strong choice for custom gateways, provided you understand the tradeoffs.

Architecture: async-first gateway with FastAPI

A production Python gateway should be fully async. Blocking calls in a gateway are lethal — one slow upstream service stalls the event loop and every other request queues behind it.

import httpx
from fastapi import FastAPI, Request, Response, Depends
from fastapi.middleware.cors import CORSMiddleware
import time
import uuid

app = FastAPI()

# Connection pool — reuse across requests
client = httpx.AsyncClient(
    timeout=httpx.Timeout(connect=2.0, read=10.0, write=5.0, pool=5.0),
    limits=httpx.Limits(max_connections=200, max_keepalive_connections=50),
)

SERVICE_MAP = {
    "/api/orders": "http://orders-service:8000",
    "/api/users": "http://users-service:8000",
    "/api/inventory": "http://inventory-service:8000",
}

async def resolve_upstream(path: str) -> str | None:
    for prefix, upstream in SERVICE_MAP.items():
        if path.startswith(prefix):
            return upstream + path
    return None

The httpx.AsyncClient with explicit connection limits prevents file descriptor exhaustion under load. The timeout tuple gives fine-grained control — a short connect timeout catches dead hosts fast, while a longer read timeout accommodates legitimate slow queries.

Reverse proxy implementation

The minimal proxy forwards requests and streams responses:

@app.api_route("/{path:path}", methods=["GET", "POST", "PUT", "DELETE", "PATCH"])
async def proxy(request: Request, path: str):
    upstream_url = await resolve_upstream(f"/{path}")
    if not upstream_url:
        return Response(status_code=404, content="Service not found")

    headers = dict(request.headers)
    headers["X-Correlation-ID"] = str(uuid.uuid4())
    headers.pop("host", None)

    body = await request.body()

    resp = await client.request(
        method=request.method,
        url=upstream_url,
        headers=headers,
        content=body,
        params=request.query_params,
    )

    return Response(
        content=resp.content,
        status_code=resp.status_code,
        headers=dict(resp.headers),
    )

Correlation IDs injected at the gateway propagate through every service, making distributed tracing possible without each service generating its own.

Circuit breaker pattern

When an upstream service is failing, hammering it with retries makes recovery slower. A circuit breaker tracks failure rates and short-circuits requests when a threshold is crossed:

from dataclasses import dataclass, field
from time import monotonic

@dataclass
class CircuitBreaker:
    failure_threshold: int = 5
    recovery_timeout: float = 30.0
    _failures: int = 0
    _state: str = "closed"  # closed, open, half-open
    _last_failure_time: float = 0.0

    def record_success(self):
        self._failures = 0
        self._state = "closed"

    def record_failure(self):
        self._failures += 1
        self._last_failure_time = monotonic()
        if self._failures >= self.failure_threshold:
            self._state = "open"

    def allow_request(self) -> bool:
        if self._state == "closed":
            return True
        if self._state == "open":
            if monotonic() - self._last_failure_time > self.recovery_timeout:
                self._state = "half-open"
                return True
            return False
        return True  # half-open: allow one probe

# One breaker per upstream service
breakers: dict[str, CircuitBreaker] = {
    prefix: CircuitBreaker() for prefix in SERVICE_MAP
}

In production, Netflix’s Hystrix inspired this pattern. Python teams at companies like Revolut use similar per-service breakers to prevent one degraded dependency from taking down the entire gateway.

Aggregation gateway (BFF)

The aggregation pattern requires explicit endpoint definitions rather than generic proxying:

@app.get("/bff/mobile/home")
async def mobile_home(user_id: str = Depends(get_current_user)):
    profile, orders, notifications = await asyncio.gather(
        client.get(f"http://users-service:8000/api/users/{user_id}"),
        client.get(f"http://orders-service:8000/api/orders?user={user_id}&limit=5"),
        client.get(f"http://notifications-service:8000/api/notifications?user={user_id}&unread=true"),
        return_exceptions=True,
    )

    result = {"user": None, "recent_orders": [], "unread_count": 0}

    if not isinstance(profile, Exception):
        result["user"] = profile.json()
    if not isinstance(orders, Exception):
        result["recent_orders"] = orders.json()[:5]
    if not isinstance(notifications, Exception):
        result["unread_count"] = notifications.json().get("count", 0)

    return result

Using return_exceptions=True with asyncio.gather is crucial — without it, one failing service cancels the entire aggregation. Partial responses are almost always better than no response for mobile clients.

Rate limiting at the gateway

Token-bucket rate limiting protects upstream services from abuse:

from collections import defaultdict

class TokenBucket:
    def __init__(self, rate: float, capacity: int):
        self.rate = rate
        self.capacity = capacity
        self._tokens: dict[str, float] = defaultdict(lambda: float(capacity))
        self._last_refill: dict[str, float] = defaultdict(monotonic)

    def consume(self, key: str) -> bool:
        now = monotonic()
        elapsed = now - self._last_refill[key]
        self._tokens[key] = min(
            self.capacity,
            self._tokens[key] + elapsed * self.rate,
        )
        self._last_refill[key] = now
        if self._tokens[key] >= 1:
            self._tokens[key] -= 1
            return True
        return False

limiter = TokenBucket(rate=10.0, capacity=100)  # 10 req/s, burst of 100

For distributed deployments (multiple gateway replicas), move the token state to Redis using INCR with EXPIRE — the sliding window counter pattern gives accuracy without Lua scripts.

Observability hooks

A gateway is the ideal place to capture golden signals — latency, traffic, errors, and saturation:

from starlette.middleware.base import BaseHTTPMiddleware

class MetricsMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        start = time.perf_counter()
        response = await call_next(request)
        duration = time.perf_counter() - start

        # Emit to Prometheus, Datadog, or structured logging
        print(f"method={request.method} path={request.url.path} "
              f"status={response.status_code} duration={duration:.4f}s")
        return response

app.add_middleware(MetricsMiddleware)

Add the correlation ID from the proxy layer to every log line, and you get end-to-end request tracing across services without a full distributed tracing deployment.

Security at the boundary

The gateway should be the only publicly exposed component. Internal services listen on private networks or use mTLS. Key security responsibilities at the gateway:

JWT validation — Verify signatures and expiry before forwarding. Never pass raw tokens to internal services; extract claims and forward as trusted headers.
Input sanitisation — Reject oversized payloads, malformed Content-Types, and suspicious path traversals.
TLS termination — Handle HTTPS at the gateway; internal traffic can use plaintext over a trusted network.
CORS enforcement — Centralised CORS headers prevent inconsistencies across services.

Tradeoffs and failure modes

Decision	Upside	Downside
Single gateway	Simple topology, one place for auth	Single point of failure; requires HA deployment
Per-client BFF	Tailored responses, less over-fetching	More gateway code to maintain per client type
Fat gateway (logic in gateway)	Fewer services to deploy	Gateway becomes a monolith, harder to scale
Thin gateway (proxy only)	Easy to replace or swap	Clients need more round trips

The biggest production failure mode is the gateway becoming a bottleneck. If every request passes through a single Python process, you need horizontal scaling (multiple Uvicorn workers behind a load balancer) and careful monitoring of event loop saturation.

Real-world example: Stripe’s API design

Stripe’s API is effectively a gateway — one base URL (api.stripe.com), versioned endpoints, consistent error formats, and idempotency keys baked into the protocol. Python teams building B2B APIs often model their gateways on Stripe’s conventions: consistent envelope responses, pagination cursors, and explicit API versioning via headers.

The one thing to remember: A production Python gateway needs async I/O, per-service circuit breakers, and centralised observability — without these, the gateway becomes the weakest link instead of the strongest.

pythonbackendarchitecture