API Gateway Patterns in Python — Deep Dive
Technical perspective
An API gateway sits at the most critical point in a distributed system — the boundary between untrusted external traffic and internal services. Getting the implementation wrong causes cascading failures, security holes, or latency spikes that ripple through every consumer. Python’s async capabilities, rich middleware ecosystem, and rapid prototyping speed make it a strong choice for custom gateways, provided you understand the tradeoffs.
Architecture: async-first gateway with FastAPI
A production Python gateway should be fully async. Blocking calls in a gateway are lethal — one slow upstream service stalls the event loop and every other request queues behind it.
import httpx
from fastapi import FastAPI, Request, Response, Depends
from fastapi.middleware.cors import CORSMiddleware
import time
import uuid
app = FastAPI()
# Connection pool — reuse across requests
client = httpx.AsyncClient(
timeout=httpx.Timeout(connect=2.0, read=10.0, write=5.0, pool=5.0),
limits=httpx.Limits(max_connections=200, max_keepalive_connections=50),
)
SERVICE_MAP = {
"/api/orders": "http://orders-service:8000",
"/api/users": "http://users-service:8000",
"/api/inventory": "http://inventory-service:8000",
}
async def resolve_upstream(path: str) -> str | None:
for prefix, upstream in SERVICE_MAP.items():
if path.startswith(prefix):
return upstream + path
return None
The httpx.AsyncClient with explicit connection limits prevents file descriptor exhaustion under load. The timeout tuple gives fine-grained control — a short connect timeout catches dead hosts fast, while a longer read timeout accommodates legitimate slow queries.
Reverse proxy implementation
The minimal proxy forwards requests and streams responses:
@app.api_route("/{path:path}", methods=["GET", "POST", "PUT", "DELETE", "PATCH"])
async def proxy(request: Request, path: str):
upstream_url = await resolve_upstream(f"/{path}")
if not upstream_url:
return Response(status_code=404, content="Service not found")
headers = dict(request.headers)
headers["X-Correlation-ID"] = str(uuid.uuid4())
headers.pop("host", None)
body = await request.body()
resp = await client.request(
method=request.method,
url=upstream_url,
headers=headers,
content=body,
params=request.query_params,
)
return Response(
content=resp.content,
status_code=resp.status_code,
headers=dict(resp.headers),
)
Correlation IDs injected at the gateway propagate through every service, making distributed tracing possible without each service generating its own.
Circuit breaker pattern
When an upstream service is failing, hammering it with retries makes recovery slower. A circuit breaker tracks failure rates and short-circuits requests when a threshold is crossed:
from dataclasses import dataclass, field
from time import monotonic
@dataclass
class CircuitBreaker:
failure_threshold: int = 5
recovery_timeout: float = 30.0
_failures: int = 0
_state: str = "closed" # closed, open, half-open
_last_failure_time: float = 0.0
def record_success(self):
self._failures = 0
self._state = "closed"
def record_failure(self):
self._failures += 1
self._last_failure_time = monotonic()
if self._failures >= self.failure_threshold:
self._state = "open"
def allow_request(self) -> bool:
if self._state == "closed":
return True
if self._state == "open":
if monotonic() - self._last_failure_time > self.recovery_timeout:
self._state = "half-open"
return True
return False
return True # half-open: allow one probe
# One breaker per upstream service
breakers: dict[str, CircuitBreaker] = {
prefix: CircuitBreaker() for prefix in SERVICE_MAP
}
In production, Netflix’s Hystrix inspired this pattern. Python teams at companies like Revolut use similar per-service breakers to prevent one degraded dependency from taking down the entire gateway.
Aggregation gateway (BFF)
The aggregation pattern requires explicit endpoint definitions rather than generic proxying:
@app.get("/bff/mobile/home")
async def mobile_home(user_id: str = Depends(get_current_user)):
profile, orders, notifications = await asyncio.gather(
client.get(f"http://users-service:8000/api/users/{user_id}"),
client.get(f"http://orders-service:8000/api/orders?user={user_id}&limit=5"),
client.get(f"http://notifications-service:8000/api/notifications?user={user_id}&unread=true"),
return_exceptions=True,
)
result = {"user": None, "recent_orders": [], "unread_count": 0}
if not isinstance(profile, Exception):
result["user"] = profile.json()
if not isinstance(orders, Exception):
result["recent_orders"] = orders.json()[:5]
if not isinstance(notifications, Exception):
result["unread_count"] = notifications.json().get("count", 0)
return result
Using return_exceptions=True with asyncio.gather is crucial — without it, one failing service cancels the entire aggregation. Partial responses are almost always better than no response for mobile clients.
Rate limiting at the gateway
Token-bucket rate limiting protects upstream services from abuse:
from collections import defaultdict
class TokenBucket:
def __init__(self, rate: float, capacity: int):
self.rate = rate
self.capacity = capacity
self._tokens: dict[str, float] = defaultdict(lambda: float(capacity))
self._last_refill: dict[str, float] = defaultdict(monotonic)
def consume(self, key: str) -> bool:
now = monotonic()
elapsed = now - self._last_refill[key]
self._tokens[key] = min(
self.capacity,
self._tokens[key] + elapsed * self.rate,
)
self._last_refill[key] = now
if self._tokens[key] >= 1:
self._tokens[key] -= 1
return True
return False
limiter = TokenBucket(rate=10.0, capacity=100) # 10 req/s, burst of 100
For distributed deployments (multiple gateway replicas), move the token state to Redis using INCR with EXPIRE — the sliding window counter pattern gives accuracy without Lua scripts.
Observability hooks
A gateway is the ideal place to capture golden signals — latency, traffic, errors, and saturation:
from starlette.middleware.base import BaseHTTPMiddleware
class MetricsMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
start = time.perf_counter()
response = await call_next(request)
duration = time.perf_counter() - start
# Emit to Prometheus, Datadog, or structured logging
print(f"method={request.method} path={request.url.path} "
f"status={response.status_code} duration={duration:.4f}s")
return response
app.add_middleware(MetricsMiddleware)
Add the correlation ID from the proxy layer to every log line, and you get end-to-end request tracing across services without a full distributed tracing deployment.
Security at the boundary
The gateway should be the only publicly exposed component. Internal services listen on private networks or use mTLS. Key security responsibilities at the gateway:
- JWT validation — Verify signatures and expiry before forwarding. Never pass raw tokens to internal services; extract claims and forward as trusted headers.
- Input sanitisation — Reject oversized payloads, malformed Content-Types, and suspicious path traversals.
- TLS termination — Handle HTTPS at the gateway; internal traffic can use plaintext over a trusted network.
- CORS enforcement — Centralised CORS headers prevent inconsistencies across services.
Tradeoffs and failure modes
| Decision | Upside | Downside |
|---|---|---|
| Single gateway | Simple topology, one place for auth | Single point of failure; requires HA deployment |
| Per-client BFF | Tailored responses, less over-fetching | More gateway code to maintain per client type |
| Fat gateway (logic in gateway) | Fewer services to deploy | Gateway becomes a monolith, harder to scale |
| Thin gateway (proxy only) | Easy to replace or swap | Clients need more round trips |
The biggest production failure mode is the gateway becoming a bottleneck. If every request passes through a single Python process, you need horizontal scaling (multiple Uvicorn workers behind a load balancer) and careful monitoring of event loop saturation.
Real-world example: Stripe’s API design
Stripe’s API is effectively a gateway — one base URL (api.stripe.com), versioned endpoints, consistent error formats, and idempotency keys baked into the protocol. Python teams building B2B APIs often model their gateways on Stripe’s conventions: consistent envelope responses, pagination cursors, and explicit API versioning via headers.
The one thing to remember: A production Python gateway needs async I/O, per-service circuit breakers, and centralised observability — without these, the gateway becomes the weakest link instead of the strongest.
See Also
- Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.
- Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
- Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.
- Python 311 New Features Python 3.11 made everything faster, error messages smarter, and let you catch several mistakes at once instead of stopping at the first one.
- Python 312 New Features Python 3.12 made type hints shorter, f-strings more powerful, and started preparing Python's engine for a world without the GIL.