API Client Design — Deep Dive

Build production-grade Python API clients with connection pooling, typed responses, middleware hooks, and testable architecture.

Architecture overview

A production API client in Python is more than a convenience wrapper — it’s a contract between your application and an external service. The design choices you make here affect reliability, testability, and developer experience across your entire codebase.

The architecture follows a layered approach: transport → middleware → resource → response model.

Transport layer with httpx

Modern Python API clients benefit from httpx over requests because it supports both sync and async from the same codebase:

import httpx
from typing import Any


class Transport:
    def __init__(
        self,
        base_url: str,
        timeout: float = 30.0,
        max_retries: int = 3,
    ):
        self._client = httpx.Client(
            base_url=base_url,
            timeout=httpx.Timeout(timeout),
            limits=httpx.Limits(
                max_connections=100,
                max_keepalive_connections=20,
            ),
        )
        self._max_retries = max_retries

    def request(
        self, method: str, path: str, **kwargs: Any
    ) -> httpx.Response:
        response = self._client.request(method, path, **kwargs)
        return response

    def close(self) -> None:
        self._client.close()

The httpx.Limits configuration controls the connection pool. Setting max_keepalive_connections prevents socket exhaustion while maintaining reuse. The default in httpx is 20, which works for most single-service clients. High-throughput services may need 50-100.

Authentication middleware

Authentication should be injected, not hardcoded. Use httpx’s auth protocol or a custom event hook:

import httpx
import time
from dataclasses import dataclass


@dataclass
class TokenAuth(httpx.Auth):
    access_token: str
    refresh_token: str
    expires_at: float
    token_url: str

    def auth_flow(self, request: httpx.Request):
        if time.time() >= self.expires_at - 30:
            self._refresh()
        request.headers["Authorization"] = f"Bearer {self.access_token}"
        yield request

    def _refresh(self) -> None:
        resp = httpx.post(
            self.token_url,
            data={
                "grant_type": "refresh_token",
                "refresh_token": self.refresh_token,
            },
        )
        resp.raise_for_status()
        data = resp.json()
        self.access_token = data["access_token"]
        self.expires_at = time.time() + data["expires_in"]

This pattern handles token refresh transparently. The auth_flow generator is called for every request, so expiry checks are automatic.

Typed exception hierarchy

Map HTTP status codes to a typed exception tree:

class APIError(Exception):
    def __init__(self, status_code: int, message: str, response_body: dict):
        self.status_code = status_code
        self.message = message
        self.response_body = response_body
        super().__init__(f"{status_code}: {message}")


class NotFoundError(APIError):
    pass


class ValidationError(APIError):
    pass


class RateLimitError(APIError):
    def __init__(self, retry_after: float, **kwargs):
        super().__init__(**kwargs)
        self.retry_after = retry_after


class ServerError(APIError):
    """5xx errors — typically retriable."""
    pass


def raise_for_status(response: httpx.Response) -> None:
    if response.is_success:
        return

    body = response.json() if response.headers.get(
        "content-type", ""
    ).startswith("application/json") else {}
    message = body.get("error", {}).get("message", response.reason_phrase)

    if response.status_code == 404:
        raise NotFoundError(
            status_code=404, message=message, response_body=body
        )
    elif response.status_code == 422:
        raise ValidationError(
            status_code=422, message=message, response_body=body
        )
    elif response.status_code == 429:
        retry_after = float(
            response.headers.get("Retry-After", 1.0)
        )
        raise RateLimitError(
            retry_after=retry_after,
            status_code=429,
            message=message,
            response_body=body,
        )
    elif response.status_code >= 500:
        raise ServerError(
            status_code=response.status_code,
            message=message,
            response_body=body,
        )
    else:
        raise APIError(
            status_code=response.status_code,
            message=message,
            response_body=body,
        )

Callers can now catch RateLimitError specifically and inspect retry_after, rather than parsing raw responses.

Resource layer pattern

Group endpoints into resource classes that share the transport:

from dataclasses import dataclass
from typing import Optional


@dataclass
class User:
    id: int
    email: str
    name: str


class UsersResource:
    def __init__(self, transport: Transport):
        self._transport = transport

    def get(self, user_id: int) -> User:
        resp = self._transport.request("GET", f"/users/{user_id}")
        raise_for_status(resp)
        data = resp.json()
        return User(id=data["id"], email=data["email"], name=data["name"])

    def list(
        self, page: int = 1, per_page: int = 20
    ) -> list[User]:
        resp = self._transport.request(
            "GET", "/users", params={"page": page, "per_page": per_page}
        )
        raise_for_status(resp)
        return [
            User(id=u["id"], email=u["email"], name=u["name"])
            for u in resp.json()["data"]
        ]

    def create(self, email: str, name: str) -> User:
        resp = self._transport.request(
            "POST", "/users", json={"email": email, "name": name}
        )
        raise_for_status(resp)
        data = resp.json()
        return User(id=data["id"], email=data["email"], name=data["name"])

Composing the client

class MyServiceClient:
    def __init__(
        self,
        api_key: str,
        base_url: str = "https://api.myservice.com/v1",
        timeout: float = 30.0,
    ):
        self._transport = Transport(
            base_url=base_url, timeout=timeout
        )
        self._transport._client.headers["Authorization"] = (
            f"Bearer {api_key}"
        )
        self.users = UsersResource(self._transport)

    def close(self) -> None:
        self._transport.close()

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

Usage becomes clean and discoverable: client.users.get(42).

Retry strategy with backoff

Embed retry logic in the transport for idempotent methods:

import time
import random


IDEMPOTENT_METHODS = {"GET", "PUT", "DELETE", "HEAD", "OPTIONS"}


def request_with_retry(
    client: httpx.Client,
    method: str,
    path: str,
    max_retries: int = 3,
    **kwargs,
) -> httpx.Response:
    retries = 0
    while True:
        try:
            resp = client.request(method, path, **kwargs)
            if resp.status_code == 429:
                wait = float(resp.headers.get("Retry-After", 1.0))
                time.sleep(wait)
                retries += 1
                if retries > max_retries:
                    raise_for_status(resp)
                continue
            if resp.status_code >= 500 and method in IDEMPOTENT_METHODS:
                retries += 1
                if retries > max_retries:
                    raise_for_status(resp)
                backoff = min(2 ** retries + random.random(), 30)
                time.sleep(backoff)
                continue
            return resp
        except httpx.TransportError:
            retries += 1
            if retries > max_retries:
                raise
            backoff = min(2 ** retries + random.random(), 30)
            time.sleep(backoff)

Key detail: only retry idempotent methods automatically. Retrying a POST without an idempotency key risks duplicate records.

Testing your client

Design the transport as an injectable dependency. In tests, replace it with a mock transport or use httpx.MockTransport:

import httpx


def mock_handler(request: httpx.Request) -> httpx.Response:
    if request.url.path == "/users/1":
        return httpx.Response(
            200,
            json={"id": 1, "email": "test@example.com", "name": "Test"},
        )
    return httpx.Response(404, json={"error": {"message": "Not found"}})


def test_get_user():
    mock_client = httpx.Client(
        transport=httpx.MockTransport(mock_handler),
        base_url="https://api.test.com/v1",
    )
    transport = Transport.__new__(Transport)
    transport._client = mock_client
    transport._max_retries = 0

    users = UsersResource(transport)
    user = users.get(1)
    assert user.email == "test@example.com"

No network calls, fast tests, and you’re testing your parsing and error-handling logic in isolation.

Tradeoffs and design decisions

Decision	Tradeoff
Sync vs async	Sync is simpler; async needed for high-concurrency services
Pydantic models vs dataclasses	Pydantic validates at parse time; dataclasses are lighter
Retry in transport vs caller	Transport retry is invisible — callers can’t override per-call
Raising exceptions vs returning Result types	Exceptions are Pythonic but can be missed; Result forces handling

Real-world reference implementations

Study these well-designed Python API clients for inspiration:

stripe-python: Clean resource grouping, automatic pagination, idempotency keys
google-cloud-python: gapic-generated clients with retry config objects
github3.py: Session-based auth, lazy iteration over paginated results
httpx itself: Its Client class is a masterclass in transport configuration

The one thing to remember: A production API client encodes domain knowledge (which calls are safe to retry, how auth refreshes, what errors mean) into reusable, testable layers — so every caller inherits correct behavior without thinking about it.

pythonapisarchitecture