REST API Best Practices — Deep Dive
Beyond the basics
Building a REST API that works in development is straightforward. Building one that survives production traffic, evolving requirements, and a team of developers working on it simultaneously requires deliberate architectural choices. This guide covers the patterns that separate toy APIs from systems handling millions of requests.
Idempotency design
A POST request that creates an order might be retried by the client due to a network timeout. Without idempotency, you get duplicate orders. The solution is an idempotency key — a client-generated unique identifier sent in a header:
from fastapi import FastAPI, Header, HTTPException
from uuid import UUID
import redis
app = FastAPI()
cache = redis.Redis()
@app.post("/orders", status_code=201)
async def create_order(
order: OrderCreate,
idempotency_key: UUID = Header(...)
):
key = f"idempotency:{idempotency_key}"
cached = cache.get(key)
if cached:
return json.loads(cached)
result = await process_order(order)
cache.setex(key, 86400, json.dumps(result))
return result
Stripe popularized this pattern. The key insight: store the response, not just a flag. If the client retries, they get the same response they would have gotten originally. Set a TTL (24 hours is standard) to prevent indefinite storage growth.
Partial responses and field selection
Large resources waste bandwidth when clients only need a few fields. Google’s API style guide introduced field masks:
GET /users/5?fields=id,name,email
In FastAPI, implement this with response model filtering:
from fastapi import Query
@app.get("/users/{user_id}")
async def get_user(
user_id: int,
fields: str = Query(None, description="Comma-separated field list")
):
user = await fetch_user(user_id)
if fields:
requested = set(fields.split(","))
return {k: v for k, v in user.dict().items() if k in requested}
return user
This reduces payload size by 60-80% for mobile clients on Shopify’s API. The tradeoff: more complex serialization logic and cache invalidation (you can’t cache partial responses as easily).
Bulk operations
Individual CRUD endpoints break down when clients need to process hundreds of items. Two approaches work well:
Batch endpoint:
@app.post("/users/batch")
async def batch_create_users(users: list[UserCreate]):
results = []
for user in users:
try:
created = await create_user(user)
results.append({"status": 201, "data": created})
except ValidationError as e:
results.append({"status": 422, "error": str(e)})
return {"results": results}
Each item in the batch gets its own status. The overall response is always 200 — individual failures don’t fail the batch. This is how Google’s Gmail API handles bulk operations.
JSON Patch for bulk updates (RFC 6902):
@app.patch("/users/{user_id}")
async def patch_user(user_id: int, operations: list[PatchOperation]):
user = await fetch_user(user_id)
for op in operations:
if op.op == "replace":
setattr(user, op.path.lstrip("/"), op.value)
elif op.op == "remove":
setattr(user, op.path.lstrip("/"), None)
await user.save()
return user
Rate limiting implementation
Token bucket is the most common algorithm for API rate limiting. Here’s a Redis-based implementation:
import time
import redis
class TokenBucket:
def __init__(self, redis_client, capacity, refill_rate):
self.redis = redis_client
self.capacity = capacity
self.refill_rate = refill_rate # tokens per second
def allow(self, key: str) -> tuple[bool, dict]:
now = time.time()
pipe = self.redis.pipeline()
bucket_key = f"ratelimit:{key}"
data = self.redis.hgetall(bucket_key)
tokens = float(data.get(b"tokens", self.capacity))
last_refill = float(data.get(b"last_refill", now))
elapsed = now - last_refill
tokens = min(self.capacity, tokens + elapsed * self.refill_rate)
if tokens >= 1:
tokens -= 1
pipe.hset(bucket_key, mapping={"tokens": tokens, "last_refill": now})
pipe.expire(bucket_key, 3600)
pipe.execute()
return True, {
"X-RateLimit-Remaining": int(tokens),
"X-RateLimit-Limit": self.capacity,
}
else:
retry_after = (1 - tokens) / self.refill_rate
return False, {
"Retry-After": int(retry_after) + 1,
"X-RateLimit-Remaining": 0,
}
Always return rate limit headers — even on successful requests. Clients need to know their budget without hitting the wall.
API versioning in practice
URL-prefix versioning with FastAPI routers:
from fastapi import APIRouter
v1_router = APIRouter(prefix="/v1")
v2_router = APIRouter(prefix="/v2")
@v1_router.get("/users/{user_id}")
async def get_user_v1(user_id: int):
user = await fetch_user(user_id)
return UserResponseV1.from_orm(user)
@v2_router.get("/users/{user_id}")
async def get_user_v2(user_id: int):
user = await fetch_user(user_id)
return UserResponseV2.from_orm(user) # includes new fields
app.include_router(v1_router)
app.include_router(v2_router)
The critical rule: never break v1 after it’s published. Add fields to v1 (additive changes are safe), but don’t remove or rename them. When v2 ships, set a deprecation timeline for v1 — Stripe gives 2 years, which is generous but builds trust.
Content negotiation and serialization
Support multiple response formats when your API serves diverse clients:
from fastapi import Request
from fastapi.responses import JSONResponse, Response
import msgpack
@app.get("/data/{data_id}")
async def get_data(data_id: int, request: Request):
data = await fetch_data(data_id)
accept = request.headers.get("accept", "application/json")
if "application/msgpack" in accept:
return Response(
content=msgpack.packb(data),
media_type="application/msgpack"
)
return JSONResponse(content=data)
MessagePack responses are 20-30% smaller than JSON for typical payloads. GitHub’s API supports both.
HATEOAS — linking resources
Include navigation links in responses so clients don’t hardcode URLs:
{
"id": 5,
"name": "Alice",
"links": {
"self": "/users/5",
"orders": "/users/5/orders",
"profile": "/users/5/profile"
}
}
Few APIs implement full HATEOAS, but including self, next, and prev links on paginated responses is nearly universal and helps clients avoid URL construction bugs.
Structured logging for API observability
Every request should generate a structured log entry:
import structlog
from uuid import uuid4
logger = structlog.get_logger()
@app.middleware("http")
async def logging_middleware(request: Request, call_next):
request_id = str(uuid4())
with structlog.contextvars.bound_contextvars(request_id=request_id):
logger.info(
"request_started",
method=request.method,
path=request.url.path,
client=request.client.host,
)
response = await call_next(request)
logger.info(
"request_completed",
status=response.status_code,
)
response.headers["X-Request-ID"] = request_id
return response
Correlation IDs (X-Request-ID) let you trace a request across microservices. Without them, debugging production issues in distributed systems is nearly impossible.
Common production pitfalls
N+1 queries in list endpoints. Returning /users with nested orders triggers a separate DB query per user. Use eager loading (SQLAlchemy’s joinedload) or dataloader patterns.
Unbounded list responses. An endpoint without pagination that returns 50,000 records will time out or crash the client. Default to limit=20 and require explicit pagination.
Leaking internal IDs. Auto-increment database IDs reveal business information (competitor can estimate your user count). Use UUIDs for external-facing identifiers.
Missing request validation. Without Pydantic models (FastAPI) or marshmallow schemas (Flask), malformed input reaches your business logic and causes cryptic 500 errors instead of clean 422s.
One thing to remember: Every API design decision is a tradeoff between simplicity and power — start simple, add complexity only when real usage demands it, and always document what you chose and why.
See Also
- Python Aiohttp Client Understand Aiohttp Client through a practical analogy so your Python decisions become faster and clearer.
- Python Api Client Design Why building your own API client in Python is like creating a TV remote that only has the buttons you actually need.
- Python Api Documentation Swagger Swagger turns your Python API into an interactive playground where anyone can click buttons to try it out — no coding required.
- Python Api Mocking Responses Why testing with fake API responses is like rehearsing a play with stand-ins before the real actors show up.
- Python Api Pagination Clients Why APIs send data in pages, and how Python handles it — like reading a book one chapter at a time instead of swallowing the whole thing.