Python Grafana Dashboards — Core Concepts
Grafana is an open-source visualization platform that connects to data sources (Prometheus, Loki, Elasticsearch, PostgreSQL) and renders interactive dashboards. For Python developers, the typical pipeline is: Python app → Prometheus metrics → Grafana dashboards.
The data flow
- Your Python service exposes metrics on a
/metricsHTTP endpoint usingprometheus_clientor OpenTelemetry. - Prometheus scrapes that endpoint every 15 seconds and stores time-series data.
- Grafana queries Prometheus using PromQL (Prometheus Query Language) and renders charts.
Essential panel types
Time series
The default panel. Shows metric values over time as lines or bars. Use for:
- Request rate:
rate(http_requests_total[5m]) - Error rate:
rate(http_requests_total{status=~"5.."}[5m]) - Latency percentiles:
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
Stat
A single large number, optionally with color thresholds. Use for:
- Current active users
- Error count in the last hour
- Uptime percentage
Gauge
A semicircular dial showing a value against a range. Use for:
- CPU usage (0-100%)
- Memory usage
- Queue saturation
Table
Rows and columns. Use for:
- Top endpoints by error rate
- Slow queries with details
- Per-instance resource usage
Heatmap
Shows distribution over time. Use for:
- Request latency distribution (from histogram metrics)
- Each row is a latency bucket, color intensity shows volume
PromQL essentials for Python metrics
Assuming your Python app exposes these metrics:
http_requests_total{method="GET", endpoint="/api/orders", status="200"}
http_request_duration_seconds_bucket{endpoint="/api/orders", le="0.1"}
Common queries:
| What you want | PromQL |
|---|---|
| Requests per second | rate(http_requests_total[5m]) |
| Error percentage | sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100 |
| p95 latency | histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) |
| Active connections | http_active_requests (gauge, no rate needed) |
The [5m] window smooths out spikes. Use [1m] for near-real-time dashboards.
Building a Python service dashboard
A well-structured dashboard for a Python web service has four rows:
Row 1: Overview
- Request rate (time series) — are we getting traffic?
- Error rate % (stat with red threshold) — is anything broken?
- p50 / p95 / p99 latency (time series with three lines) — are we fast?
Row 2: Endpoints
- Request rate by endpoint (time series, multi-line) — which endpoints are hot?
- Error rate by endpoint (table, sorted by errors) — which endpoints are failing?
Row 3: Infrastructure
- CPU usage (gauge) — are we compute-bound?
- Memory usage (gauge) — are we leaking?
- Active connections (time series) — connection pool health
Row 4: Dependencies
- Database query latency (heatmap) — is the DB slow?
- External API latency (time series) — are dependencies healthy?
- Cache hit ratio (stat) — is caching working?
Variables and templates
Grafana supports dashboard variables that turn static dashboards into dynamic ones:
- Environment dropdown: Filter all panels by
production,staging, ordevelopment. - Service dropdown: Switch between microservices.
- Time range: Built-in; all panels respect the global time selector.
Variables are defined in dashboard settings and used in queries: http_requests_total{environment="$environment"}.
Annotations
Mark deployments, incidents, and config changes on your graphs:
import httpx
def annotate_deployment(version: str, grafana_url: str, api_key: str):
httpx.post(
f"{grafana_url}/api/annotations",
json={"text": f"Deployed v{version}", "tags": ["deploy"]},
headers={"Authorization": f"Bearer {api_key}"}
)
Deployment markers on latency graphs immediately show whether a deploy caused a regression.
Common misconception
“More panels means a better dashboard.” The best dashboards have 6-10 panels that answer specific questions. A dashboard with 40 panels is a dashboard nobody reads. Start with the four golden signals (rate, errors, latency, saturation) and add panels only when you have a specific operational question they answer.
One thing to remember: A great Grafana dashboard answers “Is my Python service healthy?” in under 5 seconds. If it takes longer, you have too many panels or the wrong queries.
See Also
- Python Alerting Patterns Alerting is a smoke detector for your code — it wakes you up when something is burning, not when someone is cooking.
- Python Correlation Ids Correlation IDs are name tags for requests — they let you follow one visitor's journey through a crowded theme park of services.
- Python Log Aggregation Elk ELK collects scattered log files from all your services into one searchable place — like gathering every sticky note in the office into a single filing cabinet.
- Python Logging Best Practices Treat logs like a flight recorder so you can understand failures after they happen, not just during development.
- Python Logging Handlers Think of logging handlers as mailboxes that decide where your app's messages end up — screen, file, or faraway server.