Python Request Tracing — Core Concepts
Request tracing records the path a single request takes through your system, capturing timing and metadata at each step. It answers questions that logs and metrics alone cannot: why was this specific request slow, and which service caused the delay?
Traces and spans
A trace represents one end-to-end request. It contains one or more spans. Each span represents a unit of work — an HTTP handler, a database query, a cache lookup.
Spans have:
- A name (e.g.,
POST /orders) - A start time and duration
- A parent span (creating a tree structure)
- Attributes (key-value metadata like
http.status_code=200) - Events (timestamped annotations like “cache miss”)
A typical trace for an e-commerce checkout might look like:
[POST /checkout] ──────────────────────── 320ms
├─ [validate_cart] ──────── 15ms
├─ [charge_payment] ─────────────── 180ms
│ ├─ [stripe_api_call] ──────── 170ms
│ └─ [save_transaction] ── 8ms
└─ [send_confirmation] ────── 45ms
└─ [email_service_call] ── 40ms
You can immediately see that stripe_api_call dominates the latency.
Context propagation
For tracing to work across services, each service must pass trace context to the next. The W3C traceparent header is the standard format:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
│ │ │ │
│ trace-id (32 hex) span-id (16 hex) flags
version
When Service A calls Service B, it includes this header. Service B creates a new span with Service A’s span as its parent. The trace ID stays the same across all services.
Setting up tracing in Python
The OpenTelemetry SDK is the standard approach:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://collector:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("my-service")
Creating spans
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", order_id)
span.set_attribute("order.total", total)
result = do_work()
Spans nest automatically. If do_work() creates its own span, it becomes a child of process_order.
Auto-instrumentation
OpenTelemetry provides automatic instrumentation for common libraries:
| Library | Package |
|---|---|
| FastAPI / Starlette | opentelemetry-instrumentation-fastapi |
| Django | opentelemetry-instrumentation-django |
| Flask | opentelemetry-instrumentation-flask |
| requests / httpx | opentelemetry-instrumentation-requests |
| SQLAlchemy | opentelemetry-instrumentation-sqlalchemy |
| psycopg2 | opentelemetry-instrumentation-psycopg2 |
| Redis | opentelemetry-instrumentation-redis |
Install the package and call its instrument() method — or use opentelemetry-instrument CLI to auto-instrument without code changes.
Sampling
Not every request needs tracing. At high traffic, tracing everything generates too much data. Sampling strategies include:
- Head-based sampling: Decide at the start (e.g., trace 10% of requests). Simple but misses rare errors.
- Tail-based sampling: Collect all spans, then decide after the request completes. Keeps errors and slow requests, drops boring ones. Requires a collector with buffering.
- Priority sampling: Always trace requests with certain headers or from specific users.
Viewing traces
Traces are visualized in backends like:
- Jaeger — open source, widely used
- Zipkin — simpler, mature
- Grafana Tempo — integrates with Grafana dashboards
- Datadog / New Relic / Honeycomb — commercial with rich UIs
All accept data via the OTLP protocol from OpenTelemetry collectors.
Common misconception
“Tracing replaces logging.” Tracing shows timing and causality across services. Logging captures detailed application state (variable values, business logic decisions). You need both. The best practice is to include the trace ID in your log lines so you can jump from a trace to the relevant logs.
One thing to remember: A trace is a tree of timed operations. Once you can see the tree, every performance question becomes a visual exercise — find the widest bar and optimize it.
See Also
- Python Alerting Patterns Alerting is a smoke detector for your code — it wakes you up when something is burning, not when someone is cooking.
- Python Correlation Ids Correlation IDs are name tags for requests — they let you follow one visitor's journey through a crowded theme park of services.
- Python Grafana Dashboards Python Grafana turns boring numbers from your Python app into colorful, real-time dashboards — like a car's dashboard but for your code.
- Python Log Aggregation Elk ELK collects scattered log files from all your services into one searchable place — like gathering every sticky note in the office into a single filing cabinet.
- Python Logging Best Practices Treat logs like a flight recorder so you can understand failures after they happen, not just during development.