REST API Best Practices — Deep Dive

Production-grade REST API patterns in Python: idempotency, partial responses, bulk operations, and the design decisions behind APIs that scale.

Beyond the basics

Building a REST API that works in development is straightforward. Building one that survives production traffic, evolving requirements, and a team of developers working on it simultaneously requires deliberate architectural choices. This guide covers the patterns that separate toy APIs from systems handling millions of requests.

Idempotency design

A POST request that creates an order might be retried by the client due to a network timeout. Without idempotency, you get duplicate orders. The solution is an idempotency key — a client-generated unique identifier sent in a header:

from fastapi import FastAPI, Header, HTTPException
from uuid import UUID
import redis

app = FastAPI()
cache = redis.Redis()

@app.post("/orders", status_code=201)
async def create_order(
    order: OrderCreate,
    idempotency_key: UUID = Header(...)
):
    key = f"idempotency:{idempotency_key}"
    cached = cache.get(key)
    if cached:
        return json.loads(cached)

    result = await process_order(order)
    cache.setex(key, 86400, json.dumps(result))
    return result

Stripe popularized this pattern. The key insight: store the response, not just a flag. If the client retries, they get the same response they would have gotten originally. Set a TTL (24 hours is standard) to prevent indefinite storage growth.

Partial responses and field selection

Large resources waste bandwidth when clients only need a few fields. Google’s API style guide introduced field masks:

GET /users/5?fields=id,name,email

In FastAPI, implement this with response model filtering:

from fastapi import Query

@app.get("/users/{user_id}")
async def get_user(
    user_id: int,
    fields: str = Query(None, description="Comma-separated field list")
):
    user = await fetch_user(user_id)
    if fields:
        requested = set(fields.split(","))
        return {k: v for k, v in user.dict().items() if k in requested}
    return user

This reduces payload size by 60-80% for mobile clients on Shopify’s API. The tradeoff: more complex serialization logic and cache invalidation (you can’t cache partial responses as easily).

Bulk operations

Individual CRUD endpoints break down when clients need to process hundreds of items. Two approaches work well:

Batch endpoint:

@app.post("/users/batch")
async def batch_create_users(users: list[UserCreate]):
    results = []
    for user in users:
        try:
            created = await create_user(user)
            results.append({"status": 201, "data": created})
        except ValidationError as e:
            results.append({"status": 422, "error": str(e)})
    return {"results": results}

Each item in the batch gets its own status. The overall response is always 200 — individual failures don’t fail the batch. This is how Google’s Gmail API handles bulk operations.

JSON Patch for bulk updates (RFC 6902):

@app.patch("/users/{user_id}")
async def patch_user(user_id: int, operations: list[PatchOperation]):
    user = await fetch_user(user_id)
    for op in operations:
        if op.op == "replace":
            setattr(user, op.path.lstrip("/"), op.value)
        elif op.op == "remove":
            setattr(user, op.path.lstrip("/"), None)
    await user.save()
    return user

Rate limiting implementation

Token bucket is the most common algorithm for API rate limiting. Here’s a Redis-based implementation:

import time
import redis

class TokenBucket:
    def __init__(self, redis_client, capacity, refill_rate):
        self.redis = redis_client
        self.capacity = capacity
        self.refill_rate = refill_rate  # tokens per second

    def allow(self, key: str) -> tuple[bool, dict]:
        now = time.time()
        pipe = self.redis.pipeline()
        bucket_key = f"ratelimit:{key}"

        data = self.redis.hgetall(bucket_key)
        tokens = float(data.get(b"tokens", self.capacity))
        last_refill = float(data.get(b"last_refill", now))

        elapsed = now - last_refill
        tokens = min(self.capacity, tokens + elapsed * self.refill_rate)

        if tokens >= 1:
            tokens -= 1
            pipe.hset(bucket_key, mapping={"tokens": tokens, "last_refill": now})
            pipe.expire(bucket_key, 3600)
            pipe.execute()
            return True, {
                "X-RateLimit-Remaining": int(tokens),
                "X-RateLimit-Limit": self.capacity,
            }
        else:
            retry_after = (1 - tokens) / self.refill_rate
            return False, {
                "Retry-After": int(retry_after) + 1,
                "X-RateLimit-Remaining": 0,
            }

Always return rate limit headers — even on successful requests. Clients need to know their budget without hitting the wall.

API versioning in practice

URL-prefix versioning with FastAPI routers:

from fastapi import APIRouter

v1_router = APIRouter(prefix="/v1")
v2_router = APIRouter(prefix="/v2")

@v1_router.get("/users/{user_id}")
async def get_user_v1(user_id: int):
    user = await fetch_user(user_id)
    return UserResponseV1.from_orm(user)

@v2_router.get("/users/{user_id}")
async def get_user_v2(user_id: int):
    user = await fetch_user(user_id)
    return UserResponseV2.from_orm(user)  # includes new fields

app.include_router(v1_router)
app.include_router(v2_router)

The critical rule: never break v1 after it’s published. Add fields to v1 (additive changes are safe), but don’t remove or rename them. When v2 ships, set a deprecation timeline for v1 — Stripe gives 2 years, which is generous but builds trust.

Content negotiation and serialization

Support multiple response formats when your API serves diverse clients:

from fastapi import Request
from fastapi.responses import JSONResponse, Response
import msgpack

@app.get("/data/{data_id}")
async def get_data(data_id: int, request: Request):
    data = await fetch_data(data_id)
    accept = request.headers.get("accept", "application/json")

    if "application/msgpack" in accept:
        return Response(
            content=msgpack.packb(data),
            media_type="application/msgpack"
        )
    return JSONResponse(content=data)

MessagePack responses are 20-30% smaller than JSON for typical payloads. GitHub’s API supports both.

HATEOAS — linking resources

Include navigation links in responses so clients don’t hardcode URLs:

{
  "id": 5,
  "name": "Alice",
  "links": {
    "self": "/users/5",
    "orders": "/users/5/orders",
    "profile": "/users/5/profile"
  }
}

Few APIs implement full HATEOAS, but including self, next, and prev links on paginated responses is nearly universal and helps clients avoid URL construction bugs.

Structured logging for API observability

Every request should generate a structured log entry:

import structlog
from uuid import uuid4

logger = structlog.get_logger()

@app.middleware("http")
async def logging_middleware(request: Request, call_next):
    request_id = str(uuid4())
    with structlog.contextvars.bound_contextvars(request_id=request_id):
        logger.info(
            "request_started",
            method=request.method,
            path=request.url.path,
            client=request.client.host,
        )
        response = await call_next(request)
        logger.info(
            "request_completed",
            status=response.status_code,
        )
        response.headers["X-Request-ID"] = request_id
        return response

Correlation IDs (X-Request-ID) let you trace a request across microservices. Without them, debugging production issues in distributed systems is nearly impossible.

Common production pitfalls

N+1 queries in list endpoints. Returning /users with nested orders triggers a separate DB query per user. Use eager loading (SQLAlchemy’s joinedload) or dataloader patterns.

Unbounded list responses. An endpoint without pagination that returns 50,000 records will time out or crash the client. Default to limit=20 and require explicit pagination.

Leaking internal IDs. Auto-increment database IDs reveal business information (competitor can estimate your user count). Use UUIDs for external-facing identifiers.

Missing request validation. Without Pydantic models (FastAPI) or marshmallow schemas (Flask), malformed input reaches your business logic and causes cryptic 500 errors instead of clean 422s.

One thing to remember: Every API design decision is a tradeoff between simplicity and power — start simple, add complexity only when real usage demands it, and always document what you chose and why.

pythonwebapisrestfastapiflask