Retry Libraries & Tenacity — Deep Dive

Production retry patterns with tenacity: async retries, custom callbacks, composing strategies, and avoiding the retry amplification trap.

Tenacity’s composable architecture

Tenacity’s power comes from its composable design. Instead of a single retry(attempts=3, backoff=2) function, it provides small, combinable pieces that handle complex real-world scenarios.

Basic patterns

The decorator approach is the most common:

from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
)
import httpx


@retry(
    stop=stop_after_attempt(4),
    wait=wait_exponential(multiplier=1, min=1, max=30),
    retry=retry_if_exception_type((httpx.TransportError, httpx.TimeoutException)),
)
def fetch_user(user_id: int) -> dict:
    resp = httpx.get(
        f"https://api.example.com/users/{user_id}",
        timeout=10.0,
    )
    resp.raise_for_status()
    return resp.json()

This retries up to 4 times with delays of 1s, 2s, 4s (capped at 30s), but only for transport and timeout errors. A 400 Bad Request raises immediately.

Retrying on specific HTTP status codes

Often you need to retry based on the response, not an exception. Use retry_if_result:

from tenacity import retry, retry_if_result, stop_after_attempt, wait_exponential


def is_retriable_status(response: httpx.Response) -> bool:
    return response.status_code in {429, 502, 503, 504}


@retry(
    stop=stop_after_attempt(4),
    wait=wait_exponential(multiplier=1, min=1, max=30),
    retry=retry_if_result(is_retriable_status),
)
def call_api(path: str) -> httpx.Response:
    return httpx.get(f"https://api.example.com{path}", timeout=10.0)

The function returns the response without raising. Tenacity inspects the return value and retries if is_retriable_status returns True.

Respecting Retry-After headers

Rate-limited APIs send a Retry-After header telling you exactly when to try again. Ignoring it wastes both your time and the server’s patience:

from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
    before_sleep,
)
import time
import httpx


class RateLimitedError(Exception):
    def __init__(self, retry_after: float):
        self.retry_after = retry_after
        super().__init__(f"Rate limited, retry after {retry_after}s")


def wait_for_rate_limit(retry_state) -> float:
    exc = retry_state.outcome.exception()
    if isinstance(exc, RateLimitedError):
        return exc.retry_after
    # Fallback to exponential backoff
    return min(2 ** retry_state.attempt_number, 60)


@retry(
    stop=stop_after_attempt(5),
    wait=wait_for_rate_limit,
    retry=retry_if_exception_type((RateLimitedError, httpx.TransportError)),
)
def robust_api_call(method: str, url: str, **kwargs) -> dict:
    resp = httpx.request(method, url, timeout=15.0, **kwargs)

    if resp.status_code == 429:
        retry_after = float(resp.headers.get("Retry-After", 5.0))
        raise RateLimitedError(retry_after)

    resp.raise_for_status()
    return resp.json()

The custom wait function extracts the delay from the exception itself, so the retry respects the server’s guidance.

Async retries

Tenacity works natively with async functions:

import httpx
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential_jitter,
    retry_if_exception_type,
)


@retry(
    stop=stop_after_attempt(4),
    wait=wait_exponential_jitter(initial=1, max=30, jitter=2),
    retry=retry_if_exception_type(httpx.TransportError),
)
async def async_fetch(client: httpx.AsyncClient, path: str) -> dict:
    resp = await client.get(path, timeout=10.0)
    resp.raise_for_status()
    return resp.json()

The wait_exponential_jitter strategy combines exponential backoff with built-in jitter. The jitter parameter controls the maximum random addition to each wait.

Combining stop and wait strategies

Tenacity’s | and & operators compose conditions:

from tenacity import (
    retry,
    stop_after_attempt,
    stop_after_delay,
    wait_exponential,
    wait_random,
    retry_if_exception_type,
)
import httpx


@retry(
    stop=(stop_after_attempt(6) | stop_after_delay(120)),
    wait=wait_exponential(multiplier=1, min=2, max=30) + wait_random(0, 2),
    retry=retry_if_exception_type((httpx.TransportError, httpx.TimeoutException)),
)
def resilient_call(url: str) -> dict:
    resp = httpx.get(url, timeout=10.0)
    resp.raise_for_status()
    return resp.json()

This stops after 6 attempts or 120 seconds total (whichever first). Wait time is exponential plus random jitter between 0-2 seconds. The + operator on wait strategies adds their results together.

Callbacks for observability

Production systems need visibility into retries. Tenacity provides three callback hooks:

import logging
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    before_sleep_log,
    after_log,
    retry_if_exception_type,
)
import httpx

logger = logging.getLogger("api_client")


@retry(
    stop=stop_after_attempt(4),
    wait=wait_exponential(multiplier=1, min=1, max=30),
    retry=retry_if_exception_type(httpx.TransportError),
    before_sleep=before_sleep_log(logger, logging.WARNING),
    after=after_log(logger, logging.DEBUG),
)
def monitored_call(url: str) -> dict:
    resp = httpx.get(url, timeout=10.0)
    resp.raise_for_status()
    return resp.json()

before_sleep fires before each wait period — perfect for emitting retry metrics to Prometheus or Datadog. after fires after each attempt, successful or not.

For custom metrics:

from tenacity import RetryCallState


def emit_retry_metric(retry_state: RetryCallState) -> None:
    attempt = retry_state.attempt_number
    fn_name = retry_state.fn.__name__  # type: ignore
    # Send to your metrics system
    logger.warning(
        "Retrying %s, attempt %d, last error: %s",
        fn_name,
        attempt,
        retry_state.outcome.exception(),
    )

The retry amplification problem

Consider a call chain: Service A → Service B → Service C. If each service retries 3 times, a failure at Service C causes:

Service C fails: 1 attempt
Service B retries 3 times to C: 3 attempts at C
Service A retries 3 times to B: each triggers 3 retries to C = 9 attempts at C

Total: up to 9 calls to Service C from one user request. With deeper chains, this grows exponentially. Mitigation strategies:

Retry budget — share a total retry budget across the chain via request headers
Retry at one layer only — typically the outermost caller
Deadline propagation — pass a deadline timestamp; stop retrying when it expires

import time
from tenacity import retry, stop_after_attempt, wait_exponential


def with_deadline(deadline: float):
    """Stop retrying if deadline has passed."""
    def should_stop(retry_state):
        return time.time() >= deadline
    return should_stop


@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(min=1, max=10),
)
def call_with_deadline(url: str, deadline: float) -> dict:
    if time.time() >= deadline:
        raise TimeoutError("Deadline exceeded, skipping retry")
    resp = httpx.get(
        url, timeout=min(10.0, deadline - time.time()), headers={"X-Deadline": str(deadline)}
    )
    resp.raise_for_status()
    return resp.json()

Tenacity vs. manual retry loops

Feature	tenacity	Manual loop
Exponential backoff	Built-in	Must implement
Jitter	Built-in	Must implement
Async support	Native	Must handle event loop
Composable conditions	Yes (`\|`, `&`, `+`)	Nested if/else
Callbacks/logging	Hook system	Inline logging
Testing	Mock the decorated function	Mock the loop
Code readability	One decorator line	10-20 lines per retry site

Configuration patterns for teams

Centralize retry policies instead of configuring per-call:

from tenacity import (
    retry,
    stop_after_attempt,
    stop_after_delay,
    wait_exponential_jitter,
    retry_if_exception_type,
)
import httpx

# Shared retry policies
STANDARD_RETRY = {
    "stop": stop_after_attempt(4) | stop_after_delay(60),
    "wait": wait_exponential_jitter(initial=1, max=30, jitter=2),
    "retry": retry_if_exception_type(
        (httpx.TransportError, httpx.TimeoutException)
    ),
}

AGGRESSIVE_RETRY = {
    "stop": stop_after_attempt(8) | stop_after_delay(300),
    "wait": wait_exponential_jitter(initial=2, max=60, jitter=5),
    "retry": retry_if_exception_type(
        (httpx.TransportError, httpx.TimeoutException)
    ),
}


@retry(**STANDARD_RETRY)
def normal_api_call(url: str) -> dict:
    resp = httpx.get(url, timeout=10.0)
    resp.raise_for_status()
    return resp.json()


@retry(**AGGRESSIVE_RETRY)
def critical_payment_call(url: str, payload: dict) -> dict:
    resp = httpx.post(url, json=payload, timeout=30.0)
    resp.raise_for_status()
    return resp.json()

Teams define retry policies in a shared module. Individual functions reference the policy by name, ensuring consistency across the codebase.

The one thing to remember: Tenacity’s real value isn’t just retrying — it’s composable stop/wait/retry conditions, built-in jitter, async support, and observability hooks that turn ad-hoc retry loops into a consistent, production-grade resilience layer.

pythonreliabilitylibraries