Retry Libraries & Tenacity — Deep Dive
Tenacity’s composable architecture
Tenacity’s power comes from its composable design. Instead of a single retry(attempts=3, backoff=2) function, it provides small, combinable pieces that handle complex real-world scenarios.
Basic patterns
The decorator approach is the most common:
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type,
)
import httpx
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential(multiplier=1, min=1, max=30),
retry=retry_if_exception_type((httpx.TransportError, httpx.TimeoutException)),
)
def fetch_user(user_id: int) -> dict:
resp = httpx.get(
f"https://api.example.com/users/{user_id}",
timeout=10.0,
)
resp.raise_for_status()
return resp.json()
This retries up to 4 times with delays of 1s, 2s, 4s (capped at 30s), but only for transport and timeout errors. A 400 Bad Request raises immediately.
Retrying on specific HTTP status codes
Often you need to retry based on the response, not an exception. Use retry_if_result:
from tenacity import retry, retry_if_result, stop_after_attempt, wait_exponential
def is_retriable_status(response: httpx.Response) -> bool:
return response.status_code in {429, 502, 503, 504}
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential(multiplier=1, min=1, max=30),
retry=retry_if_result(is_retriable_status),
)
def call_api(path: str) -> httpx.Response:
return httpx.get(f"https://api.example.com{path}", timeout=10.0)
The function returns the response without raising. Tenacity inspects the return value and retries if is_retriable_status returns True.
Respecting Retry-After headers
Rate-limited APIs send a Retry-After header telling you exactly when to try again. Ignoring it wastes both your time and the server’s patience:
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type,
before_sleep,
)
import time
import httpx
class RateLimitedError(Exception):
def __init__(self, retry_after: float):
self.retry_after = retry_after
super().__init__(f"Rate limited, retry after {retry_after}s")
def wait_for_rate_limit(retry_state) -> float:
exc = retry_state.outcome.exception()
if isinstance(exc, RateLimitedError):
return exc.retry_after
# Fallback to exponential backoff
return min(2 ** retry_state.attempt_number, 60)
@retry(
stop=stop_after_attempt(5),
wait=wait_for_rate_limit,
retry=retry_if_exception_type((RateLimitedError, httpx.TransportError)),
)
def robust_api_call(method: str, url: str, **kwargs) -> dict:
resp = httpx.request(method, url, timeout=15.0, **kwargs)
if resp.status_code == 429:
retry_after = float(resp.headers.get("Retry-After", 5.0))
raise RateLimitedError(retry_after)
resp.raise_for_status()
return resp.json()
The custom wait function extracts the delay from the exception itself, so the retry respects the server’s guidance.
Async retries
Tenacity works natively with async functions:
import httpx
from tenacity import (
retry,
stop_after_attempt,
wait_exponential_jitter,
retry_if_exception_type,
)
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential_jitter(initial=1, max=30, jitter=2),
retry=retry_if_exception_type(httpx.TransportError),
)
async def async_fetch(client: httpx.AsyncClient, path: str) -> dict:
resp = await client.get(path, timeout=10.0)
resp.raise_for_status()
return resp.json()
The wait_exponential_jitter strategy combines exponential backoff with built-in jitter. The jitter parameter controls the maximum random addition to each wait.
Combining stop and wait strategies
Tenacity’s | and & operators compose conditions:
from tenacity import (
retry,
stop_after_attempt,
stop_after_delay,
wait_exponential,
wait_random,
retry_if_exception_type,
)
import httpx
@retry(
stop=(stop_after_attempt(6) | stop_after_delay(120)),
wait=wait_exponential(multiplier=1, min=2, max=30) + wait_random(0, 2),
retry=retry_if_exception_type((httpx.TransportError, httpx.TimeoutException)),
)
def resilient_call(url: str) -> dict:
resp = httpx.get(url, timeout=10.0)
resp.raise_for_status()
return resp.json()
This stops after 6 attempts or 120 seconds total (whichever first). Wait time is exponential plus random jitter between 0-2 seconds. The + operator on wait strategies adds their results together.
Callbacks for observability
Production systems need visibility into retries. Tenacity provides three callback hooks:
import logging
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
before_sleep_log,
after_log,
retry_if_exception_type,
)
import httpx
logger = logging.getLogger("api_client")
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential(multiplier=1, min=1, max=30),
retry=retry_if_exception_type(httpx.TransportError),
before_sleep=before_sleep_log(logger, logging.WARNING),
after=after_log(logger, logging.DEBUG),
)
def monitored_call(url: str) -> dict:
resp = httpx.get(url, timeout=10.0)
resp.raise_for_status()
return resp.json()
before_sleep fires before each wait period — perfect for emitting retry metrics to Prometheus or Datadog. after fires after each attempt, successful or not.
For custom metrics:
from tenacity import RetryCallState
def emit_retry_metric(retry_state: RetryCallState) -> None:
attempt = retry_state.attempt_number
fn_name = retry_state.fn.__name__ # type: ignore
# Send to your metrics system
logger.warning(
"Retrying %s, attempt %d, last error: %s",
fn_name,
attempt,
retry_state.outcome.exception(),
)
The retry amplification problem
Consider a call chain: Service A → Service B → Service C. If each service retries 3 times, a failure at Service C causes:
- Service C fails: 1 attempt
- Service B retries 3 times to C: 3 attempts at C
- Service A retries 3 times to B: each triggers 3 retries to C = 9 attempts at C
Total: up to 9 calls to Service C from one user request. With deeper chains, this grows exponentially. Mitigation strategies:
- Retry budget — share a total retry budget across the chain via request headers
- Retry at one layer only — typically the outermost caller
- Deadline propagation — pass a deadline timestamp; stop retrying when it expires
import time
from tenacity import retry, stop_after_attempt, wait_exponential
def with_deadline(deadline: float):
"""Stop retrying if deadline has passed."""
def should_stop(retry_state):
return time.time() >= deadline
return should_stop
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(min=1, max=10),
)
def call_with_deadline(url: str, deadline: float) -> dict:
if time.time() >= deadline:
raise TimeoutError("Deadline exceeded, skipping retry")
resp = httpx.get(
url, timeout=min(10.0, deadline - time.time()), headers={"X-Deadline": str(deadline)}
)
resp.raise_for_status()
return resp.json()
Tenacity vs. manual retry loops
| Feature | tenacity | Manual loop |
|---|---|---|
| Exponential backoff | Built-in | Must implement |
| Jitter | Built-in | Must implement |
| Async support | Native | Must handle event loop |
| Composable conditions | Yes (|, &, +) | Nested if/else |
| Callbacks/logging | Hook system | Inline logging |
| Testing | Mock the decorated function | Mock the loop |
| Code readability | One decorator line | 10-20 lines per retry site |
Configuration patterns for teams
Centralize retry policies instead of configuring per-call:
from tenacity import (
retry,
stop_after_attempt,
stop_after_delay,
wait_exponential_jitter,
retry_if_exception_type,
)
import httpx
# Shared retry policies
STANDARD_RETRY = {
"stop": stop_after_attempt(4) | stop_after_delay(60),
"wait": wait_exponential_jitter(initial=1, max=30, jitter=2),
"retry": retry_if_exception_type(
(httpx.TransportError, httpx.TimeoutException)
),
}
AGGRESSIVE_RETRY = {
"stop": stop_after_attempt(8) | stop_after_delay(300),
"wait": wait_exponential_jitter(initial=2, max=60, jitter=5),
"retry": retry_if_exception_type(
(httpx.TransportError, httpx.TimeoutException)
),
}
@retry(**STANDARD_RETRY)
def normal_api_call(url: str) -> dict:
resp = httpx.get(url, timeout=10.0)
resp.raise_for_status()
return resp.json()
@retry(**AGGRESSIVE_RETRY)
def critical_payment_call(url: str, payload: dict) -> dict:
resp = httpx.post(url, json=payload, timeout=30.0)
resp.raise_for_status()
return resp.json()
Teams define retry policies in a shared module. Individual functions reference the policy by name, ensuring consistency across the codebase.
The one thing to remember: Tenacity’s real value isn’t just retrying — it’s composable stop/wait/retry conditions, built-in jitter, async support, and observability hooks that turn ad-hoc retry loops into a consistent, production-grade resilience layer.
See Also
- Python Aiohttp Client Understand Aiohttp Client through a practical analogy so your Python decisions become faster and clearer.
- Python Api Client Design Why building your own API client in Python is like creating a TV remote that only has the buttons you actually need.
- Python Api Documentation Swagger Swagger turns your Python API into an interactive playground where anyone can click buttons to try it out — no coding required.
- Python Api Mocking Responses Why testing with fake API responses is like rehearsing a play with stand-ins before the real actors show up.
- Python Api Pagination Clients Why APIs send data in pages, and how Python handles it — like reading a book one chapter at a time instead of swallowing the whole thing.