Python Timeout Patterns — Core Concepts

Why Timeouts Matter

Every network call in your program is a promise: “I’ll get back to you soon.” But “soon” isn’t guaranteed. DNS lookups stall, servers crash mid-response, firewalls silently drop packets. Without timeouts, a single unresponsive dependency can freeze your entire application.

The 2017 Amazon S3 outage is a famous example — services that didn’t have proper timeouts on their S3 calls hung indefinitely, cascading the failure far beyond storage.

The Five Timeout Types

1. Connection Timeout

How long you wait to establish a TCP connection. If the remote server isn’t accepting connections at all, this saves you from waiting for the OS default (often 75-120 seconds on Linux).

Typical value: 3-5 seconds. If you can’t connect in 5 seconds, the server is probably down or unreachable.

2. Read Timeout (Socket Timeout)

How long you wait for data after the connection is established. The server accepted your request but hasn’t sent anything back yet.

Typical value: 10-30 seconds, depending on the operation. A simple API lookup should be fast; a report generation endpoint might need more time.

3. Total (Wall Clock) Timeout

The maximum time for the entire operation — connect, send, receive, and any redirects. This is your safety net for operations that involve multiple round trips.

Typical value: 30-60 seconds for user-facing requests; longer for background jobs.

4. Per-Retry Timeout

When you retry failed requests, each attempt needs its own timeout. Without per-retry limits, retries can consume the entire total timeout on the first retry alone.

5. Cascading Deadline

In microservice architectures, you propagate a deadline through the call chain. If Service A has 10 seconds total and spends 3 seconds calling Service B, it passes a 7-second deadline to Service C. This prevents downstream services from working on requests that the upstream has already abandoned.

How Timeouts Interact

Total timeout: 30 seconds
├── Connection timeout: 5 seconds
├── Read timeout: 15 seconds
└── Retry budget: 3 attempts
    └── Per-retry timeout: 10 seconds each

The total timeout acts as a ceiling. Even if each retry has a 10-second timeout and you allow 3 retries, the total timeout of 30 seconds means you can’t actually complete all 3 retries if the first two each take 10 seconds.

Common Misconception

“Setting a timeout means my request will be cancelled after that time.” Not exactly. Setting a timeout on a network call means your client stops waiting. The server might still be processing the request. This matters for non-idempotent operations — if you timeout on a payment request and retry, you could charge the user twice. Timeouts need to pair with idempotency keys for write operations.

Choosing Timeout Values

ScenarioConnectionReadTotal
User-facing API call3s10s15s
Background job API call5s30s60s
Database query2s5s10s
Health check1s2s3s
File upload5s120s180s

Start with these as baselines, then adjust based on your P99 latency in production. The timeout should be slightly above your P99 — tight enough to catch real problems, loose enough to avoid false positives.

The No-Timeout Anti-Pattern

Libraries often default to no timeout at all. Python’s socket module, urllib, and many database drivers will wait indefinitely unless you explicitly set a timeout. This is the single most common cause of “my app just hung for no reason” incidents.

Rule: Every network call in production code must have an explicit timeout. If a library doesn’t let you set one, wrap it with asyncio.wait_for() or signal.alarm().

One thing to remember: Timeouts protect your application from the unpredictable — network partitions, overloaded servers, and silent failures. Set them explicitly on every external call, because the default is almost always “wait forever.”

pythonreliabilityasync

See Also