Prometheus Metrics in Python — Core Concepts

Instrument Python services with counters, gauges, histograms, and summaries using prometheus_client for production monitoring.

Prometheus is a time-series monitoring system that scrapes metrics from your applications at regular intervals. Unlike push-based systems where apps send data to a collector, Prometheus pulls data — your Python service exposes an HTTP endpoint, and Prometheus fetches it. This pull model simplifies configuration and makes it easy to monitor services without modifying their outbound network rules.

Metric types

The prometheus_client library provides four metric types, each suited to different measurements.

Counter

A number that only goes up. Use it for totals: requests served, errors encountered, bytes processed.

from prometheus_client import Counter

REQUEST_COUNT = Counter("http_requests_total", "Total HTTP requests", ["method", "endpoint", "status"])

# In your request handler
REQUEST_COUNT.labels(method="GET", endpoint="/api/orders", status="200").inc()

Counters reset to zero when the process restarts. Prometheus handles this gracefully — rate() and increase() functions in PromQL detect resets and calculate correctly.

Gauge

A number that goes up and down. Use it for current state: active connections, queue depth, temperature, memory usage.

from prometheus_client import Gauge

ACTIVE_CONNECTIONS = Gauge("active_connections", "Current active connections")

ACTIVE_CONNECTIONS.inc()   # Connection opened
ACTIVE_CONNECTIONS.dec()   # Connection closed
ACTIVE_CONNECTIONS.set(42) # Set to exact value

Histogram

Measures the distribution of values, typically request durations or response sizes. It buckets observations and provides count, sum, and per-bucket counts.

from prometheus_client import Histogram

REQUEST_DURATION = Histogram(
    "http_request_duration_seconds",
    "Request duration in seconds",
    ["endpoint"],
    buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
)

# Time a request
with REQUEST_DURATION.labels(endpoint="/api/orders").time():
    process_request()

The buckets define the precision of percentile calculations. Choose buckets that match your SLA thresholds.

Summary

Similar to histogram but calculates quantiles on the client side. Less commonly used because client-side quantiles cannot be aggregated across instances.

Exposing metrics

Your Python app needs an HTTP endpoint that Prometheus can scrape:

from prometheus_client import start_http_server

start_http_server(8000)  # Metrics available at http://localhost:8000/metrics

For Flask or FastAPI apps, use middleware instead:

# Flask
from prometheus_client import make_wsgi_app
from werkzeug.middleware.dispatcher import DispatcherMiddleware

app.wsgi_app = DispatcherMiddleware(app.wsgi_app, {"/metrics": make_wsgi_app()})

Labels

Labels add dimensions to metrics. A single http_requests_total counter with labels for method, endpoint, and status replaces dozens of separate metrics.

Rules for labels:

Keep cardinality manageable. Do not use user IDs or request IDs as labels — this creates millions of time series and overwhelms Prometheus.
Use labels for dimensions you will filter or group by in dashboards and alerts.
Good labels: HTTP method, status code, service name, region.
Bad labels: user email, session token, request body hash.

Querying with PromQL

Prometheus includes a query language for analyzing metrics:

# Request rate over the last 5 minutes
rate(http_requests_total[5m])

# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Error rate as a percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100

These queries power Grafana dashboards and alerting rules.

Common misconception

“Prometheus adds significant overhead to my Python application.” The prometheus_client library uses thread-safe atomic operations for metric updates. Incrementing a counter takes nanoseconds. The scrape endpoint serialization happens only when Prometheus pulls (typically every 15-30 seconds), and for most services, the response is a few kilobytes of text.

When to use Prometheus

Prometheus excels at operational monitoring: is the service healthy, how fast is it, what is the error rate. It is not designed for event logging (use structured logging), request tracing (use OpenTelemetry), or business analytics (use a data warehouse). The sweet spot is real-time operational visibility with alerting.

One thing to remember: Prometheus metrics in Python follow a simple pattern — define counters, gauges, and histograms in your code, expose them on an endpoint, and let Prometheus scrape, store, and alert on the data.

pythonprometheusobservability