Prometheus Metrics in Python — Deep Dive
Prometheus instrumentation in Python has subtleties that do not appear in tutorials. Multiprocess deployments break the default in-memory metric storage. High-cardinality labels silently degrade Prometheus performance. And the difference between a histogram and a summary matters more than most teams realize until they try to aggregate percentiles across replicas.
Multiprocess mode
The default prometheus_client stores metrics in process memory. This works for single-process services but breaks with Gunicorn (pre-fork model), where each worker is a separate process with its own counters.
The solution is multiprocess mode, which uses memory-mapped files:
import os
os.environ["PROMETHEUS_MULTIPROC_DIR"] = "/tmp/prometheus_multiproc"
from prometheus_client import CollectorRegistry, multiprocess, generate_latest
def metrics_app(environ, start_response):
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)
data = generate_latest(registry)
start_response("200 OK", [("Content-Type", "text/plain")])
return [data]
Each worker writes metrics to shared files. The metrics endpoint reads all files and aggregates them. Critical details:
- Clean the directory on startup. Stale files from previous runs cause phantom metrics.
- Gauges need aggregation modes. Use
multiprocess_modeparameter:"all"(report per-pid),"liveall"(only living pids),"livesum","max","min". - Summaries do not work in multiprocess mode. Use histograms instead.
from prometheus_client import Gauge
ACTIVE_REQUESTS = Gauge(
"active_requests", "Currently active requests",
multiprocess_mode="livesum"
)
Gunicorn child exit hook
Clean up dead worker files:
# gunicorn.conf.py
from prometheus_client import multiprocess
def child_exit(server, worker):
multiprocess.mark_process_dead(worker.pid)
Custom collectors
For metrics that are expensive to compute or come from external sources, custom collectors avoid continuous computation:
from prometheus_client.core import GaugeMetricFamily, REGISTRY
class DatabasePoolCollector:
def collect(self):
pool_stats = get_pool_stats() # Only called during scrape
gauge = GaugeMetricFamily(
"db_pool_connections",
"Database connection pool stats",
labels=["state"]
)
gauge.add_metric(["active"], pool_stats.active)
gauge.add_metric(["idle"], pool_stats.idle)
gauge.add_metric(["waiting"], pool_stats.waiting)
yield gauge
REGISTRY.register(DatabasePoolCollector())
Custom collectors are invoked only when Prometheus scrapes, so expensive computations happen at most once per scrape interval.
Histogram bucket design
Bucket boundaries determine the accuracy of quantile calculations. Poor bucket choices produce misleading percentiles.
Strategy: SLA-driven buckets
Define buckets around your SLA thresholds:
# If SLA is "99% of requests under 500ms"
REQUEST_DURATION = Histogram(
"http_request_duration_seconds",
"Request duration",
buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 10.0]
)
Dense buckets around the SLA threshold (500ms) give better resolution where it matters.
Strategy: Exponential buckets
For metrics with wide range:
from prometheus_client import Histogram
# Generates: 0.01, 0.02, 0.04, 0.08, ..., 10.24
PROCESS_TIME = Histogram(
"batch_process_seconds",
"Batch processing time",
buckets=Histogram.DEFAULT_BUCKETS # or use exponential_buckets(0.01, 2, 11)
)
Histogram vs Summary tradeoff
Histograms allow server-side quantile calculation via histogram_quantile(). This means you can aggregate across instances:
# P99 across all replicas — works with histograms
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
Summaries compute quantiles client-side and cannot be aggregated. A p99 from instance A and a p99 from instance B cannot be combined into a meaningful p99. Use histograms unless you have a specific reason not to.
Exemplars for trace correlation
Exemplars attach trace IDs to metric samples, bridging metrics and traces:
from prometheus_client import Histogram
from opentelemetry import trace
REQUEST_DURATION = Histogram("http_request_duration_seconds", "Request duration")
span = trace.get_current_span()
trace_id = format(span.get_span_context().trace_id, "032x")
REQUEST_DURATION.observe(0.25, exemplar={"traceID": trace_id})
In Grafana, clicking on a histogram bucket sample shows the associated trace ID, letting you jump from “p99 latency spiked” to the specific slow trace.
Cardinality management
Each unique combination of metric name and label values creates a time series. Prometheus performance degrades significantly above 1-2 million active series.
Cardinality estimation
series = metric_count × label1_cardinality × label2_cardinality × ...
A metric with 3 labels of cardinality (5, 200, 3) = 3,000 series. Add a user_id label with 100K users = 300 million series. That will kill Prometheus.
Defensive patterns
- Validate label values before applying them:
ALLOWED_ENDPOINTS = {"/api/orders", "/api/users", "/api/health", "/api/products"}
def safe_endpoint(path):
return path if path in ALLOWED_ENDPOINTS else "other"
-
Use
lebuckets wisely — each histogram bucket is a separate series. 10 buckets × 50 label combinations = 500 series per histogram metric. -
Monitor cardinality with Prometheus itself:
# Top 10 metrics by series count
topk(10, count by (__name__)({__name__=~".+"}))
Alerting rules
Define alerts in Prometheus or Alertmanager:
groups:
- name: python-service
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "Error rate above 5% for 5 minutes"
- alert: HighLatency
expr: |
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
> 1.0
for: 10m
labels:
severity: warning
The for clause prevents flapping — the condition must persist for the specified duration before firing.
Testing metrics
Verify instrumentation in tests:
from prometheus_client import REGISTRY
def test_request_counter_increments():
before = REGISTRY.get_sample_value(
"http_requests_total",
{"method": "GET", "endpoint": "/api/orders", "status": "200"}
) or 0
client.get("/api/orders")
after = REGISTRY.get_sample_value(
"http_requests_total",
{"method": "GET", "endpoint": "/api/orders", "status": "200"}
)
assert after == before + 1
For integration tests, scrape the /metrics endpoint and parse the output.
Push gateway for batch jobs
Short-lived batch jobs may terminate before Prometheus scrapes. The Pushgateway accepts pushed metrics:
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway
registry = CollectorRegistry()
duration = Gauge("batch_duration_seconds", "Batch job duration", registry=registry)
with duration.time():
run_batch()
push_to_gateway("localhost:9091", job="nightly_etl", registry=registry)
Use pushgateway sparingly — it is designed for batch jobs, not as a general replacement for the pull model.
One thing to remember: Production Prometheus in Python demands multiprocess-aware metric storage, cardinality-conscious label design, and histogram buckets aligned to your SLA thresholds — these operational details determine whether your monitoring helps or hinders.
See Also
- Python Adaptive Learning Systems How Python builds learning apps that adjust to each student like a personal tutor who knows exactly what you need next.
- Python Airflow Learn Airflow as a timetable manager that makes sure data tasks run in the right order every day.
- Python Altair Learn Altair through the idea of drawing charts by describing rules, not by hand-placing every visual element.
- Python Automated Grading How Python grades homework and exams automatically, from simple answer keys to understanding written essays.
- Python Batch Vs Stream Processing Batch processing is like doing laundry once a week; stream processing is like a self-cleaning shirt that cleans itself constantly.