Python Health Check Patterns — Core Concepts
Beyond “200 OK”
The simplest health check — an endpoint that always returns 200 — is barely better than nothing. It confirms the process is running and the HTTP server is accepting connections, but tells you nothing about whether the app can actually do its job. A real health check strategy uses multiple layers.
Types of Health Checks
Shallow (Liveness) Checks
These answer: “Is the process alive and responsive?” They should be fast (under 10ms), have no external dependencies, and never fail unless the process itself is broken.
Use case: container orchestrators checking if a process needs to be restarted.
Deep (Readiness) Checks
These answer: “Can this instance serve real traffic?” They verify connections to databases, caches, message brokers, and downstream services.
Use case: load balancers deciding whether to route traffic to this instance.
Startup Checks
These answer: “Has the app finished initializing?” Some apps need to load large models, warm caches, or run migrations before they’re ready.
Use case: preventing premature traffic during cold starts.
Anatomy of a Good Health Check
A well-designed check returns structured information:
| Field | Purpose |
|---|---|
| status | healthy, degraded, or unhealthy |
| checks | Individual component results |
| duration | How long the check took |
| version | App version for debugging |
The degraded state is important — it means “I can work, but something’s wrong.” Maybe the cache is down but the database is fine. The app can still serve requests, just slower.
What to Check (and What Not To)
Good checks:
- Database: execute
SELECT 1with a short timeout - Redis/cache:
PINGcommand - Disk space: is usage below 90%?
- Memory: is RSS within expected bounds?
Bad checks:
- Calling external third-party APIs (their downtime shouldn’t mark you as unhealthy)
- Running expensive queries that affect production traffic
- Checks without timeouts (a stuck database connection blocks the health endpoint)
The Cascade Problem
If Service A health-checks Service B, and Service B health-checks Service C, a single failure in C marks everything unhealthy. This cascade can take down your entire system.
The rule: only check direct dependencies. If your app talks to a database and a cache, check those. Don’t check the services that they depend on.
Common Misconception
“If the health check passes, the app is healthy.” Health checks only verify what they test. If your check doesn’t test disk I/O and the disk is failing, the check still passes. Design your checks around the specific failure modes you’ve seen in production.
One thing to remember: Good health checks are layered — a fast liveness check for orchestrators, a thorough readiness check for load balancers, and each check tests only direct dependencies with strict timeouts.
See Also
- Python Ab Testing Framework How tech companies test two versions of something to see which one wins — explained with a lemonade stand experiment.
- Python Configuration Hierarchy How your Python app decides which settings to use — explained like layers of clothing on a cold day.
- Python Feature Flag Strategies How developers turn features on and off without redeploying — explained with a TV remote control analogy.
- Python Graceful Shutdown Why your Python app needs to say goodbye properly before it stops — explained with a restaurant closing analogy.
- Python Readiness Liveness Probes The two questions every cloud platform asks your Python app — explained with a school attendance analogy.