Python Request Tracing — Core Concepts

Understand spans, traces, and context propagation to debug latency and failures across Python microservices.

Request tracing records the path a single request takes through your system, capturing timing and metadata at each step. It answers questions that logs and metrics alone cannot: why was this specific request slow, and which service caused the delay?

Traces and spans

A trace represents one end-to-end request. It contains one or more spans. Each span represents a unit of work — an HTTP handler, a database query, a cache lookup.

Spans have:

A name (e.g., POST /orders)
A start time and duration
A parent span (creating a tree structure)
Attributes (key-value metadata like http.status_code=200)
Events (timestamped annotations like “cache miss”)

A typical trace for an e-commerce checkout might look like:

[POST /checkout]  ──────────────────────── 320ms
  ├─ [validate_cart]  ──────── 15ms
  ├─ [charge_payment]  ─────────────── 180ms
  │    ├─ [stripe_api_call]  ──────── 170ms
  │    └─ [save_transaction]  ── 8ms
  └─ [send_confirmation]  ────── 45ms
       └─ [email_service_call]  ── 40ms

You can immediately see that stripe_api_call dominates the latency.

Context propagation

For tracing to work across services, each service must pass trace context to the next. The W3C traceparent header is the standard format:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
              │  │                                │                  │
              │  trace-id (32 hex)                span-id (16 hex)  flags
              version

When Service A calls Service B, it includes this header. Service B creates a new span with Service A’s span as its parent. The trace ID stays the same across all services.

Setting up tracing in Python

The OpenTelemetry SDK is the standard approach:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://collector:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer("my-service")

Creating spans

with tracer.start_as_current_span("process_order") as span:
    span.set_attribute("order.id", order_id)
    span.set_attribute("order.total", total)
    result = do_work()

Spans nest automatically. If do_work() creates its own span, it becomes a child of process_order.

Auto-instrumentation

OpenTelemetry provides automatic instrumentation for common libraries:

Library	Package
FastAPI / Starlette	`opentelemetry-instrumentation-fastapi`
Django	`opentelemetry-instrumentation-django`
Flask	`opentelemetry-instrumentation-flask`
requests / httpx	`opentelemetry-instrumentation-requests`
SQLAlchemy	`opentelemetry-instrumentation-sqlalchemy`
psycopg2	`opentelemetry-instrumentation-psycopg2`
Redis	`opentelemetry-instrumentation-redis`

Install the package and call its instrument() method — or use opentelemetry-instrument CLI to auto-instrument without code changes.

Sampling

Not every request needs tracing. At high traffic, tracing everything generates too much data. Sampling strategies include:

Head-based sampling: Decide at the start (e.g., trace 10% of requests). Simple but misses rare errors.
Tail-based sampling: Collect all spans, then decide after the request completes. Keeps errors and slow requests, drops boring ones. Requires a collector with buffering.
Priority sampling: Always trace requests with certain headers or from specific users.

Viewing traces

Traces are visualized in backends like:

Jaeger — open source, widely used
Zipkin — simpler, mature
Grafana Tempo — integrates with Grafana dashboards
Datadog / New Relic / Honeycomb — commercial with rich UIs

All accept data via the OTLP protocol from OpenTelemetry collectors.

Common misconception

“Tracing replaces logging.” Tracing shows timing and causality across services. Logging captures detailed application state (variable values, business logic decisions). You need both. The best practice is to include the trace ID in your log lines so you can jump from a trace to the relevant logs.

One thing to remember: A trace is a tree of timed operations. Once you can see the tree, every performance question becomes a visual exercise — find the widest bar and optimize it.

pythonobservabilitymicroservices