Python Request Tracing — Deep Dive

Distributed tracing gives you causality — not just “what happened” but “what caused what, and how long each step took.” This deep dive covers the OpenTelemetry Python SDK internals, custom instrumentation, sampling architectures, and production deployment patterns.

OpenTelemetry SDK architecture

The SDK has four key components:

  1. TracerProvider — factory for Tracer instances, holds configuration.
  2. Tracer — creates spans, scoped to an instrumentation library name/version.
  3. SpanProcessor — receives span lifecycle events (start, end). Two built-in types:
    • SimpleSpanProcessor — exports immediately (testing/development).
    • BatchSpanProcessor — buffers spans and exports in batches (production).
  4. SpanExporter — sends span data to a backend (OTLP, Jaeger, Zipkin, console).
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.sdk.resources import Resource

resource = Resource.create({
    "service.name": "order-api",
    "service.version": "2.1.0",
    "deployment.environment": "production"
})

provider = TracerProvider(resource=resource)
provider.add_span_processor(
    BatchSpanProcessor(
        ConsoleSpanExporter(),
        max_queue_size=2048,
        max_export_batch_size=512,
        schedule_delay_millis=5000
    )
)
trace.set_tracer_provider(provider)

The Resource attaches static metadata to every span — critical for filtering traces by service, version, and environment in your backend.

Span lifecycle and context

Creating spans with full control

tracer = trace.get_tracer("order.service", "2.1.0")

def process_payment(order_id: str, amount: float):
    with tracer.start_as_current_span(
        "process_payment",
        kind=trace.SpanKind.INTERNAL,
        attributes={
            "order.id": order_id,
            "payment.amount": amount,
            "payment.currency": "USD"
        }
    ) as span:
        try:
            result = stripe_charge(amount)
            span.set_attribute("payment.stripe_id", result.id)
            span.set_status(trace.StatusCode.OK)
            return result
        except stripe.CardError as e:
            span.set_status(trace.StatusCode.ERROR, str(e))
            span.record_exception(e)
            raise

SpanKind values:

  • INTERNAL — default, in-process work
  • SERVER — handling an incoming request
  • CLIENT — making an outgoing request
  • PRODUCER — enqueuing a message
  • CONSUMER — processing a queued message

Context propagation internals

OpenTelemetry uses Python’s contextvars module to store the current span. When you call start_as_current_span, the SDK:

  1. Gets the current context (context.get_current()).
  2. Creates a new span with the current span as parent.
  3. Sets the new span as current via context.attach().
  4. On exit, calls context.detach() to restore the previous span.

This is why spans automatically nest — each new span looks up its parent from the context.

Manual context propagation

For cases where automatic propagation doesn’t work (thread pools, callback-based code):

from opentelemetry import context

# Capture context in the calling thread
ctx = context.get_current()

# In the worker thread
token = context.attach(ctx)
try:
    with tracer.start_as_current_span("background_work"):
        do_work()
finally:
    context.detach(token)

Custom propagators

The default propagator uses W3C traceparent / tracestate headers. For legacy systems using B3 (Zipkin) format:

from opentelemetry.propagators.b3 import B3MultiFormat
from opentelemetry import propagate

propagate.set_global_textmap(B3MultiFormat())

For systems using multiple header formats, compose propagators:

from opentelemetry.propagators.composite import CompositePropagator
from opentelemetry.propagate import set_global_textmap

set_global_textmap(CompositePropagator([
    TraceContextTextMapPropagator(),  # W3C
    B3MultiFormat(),                   # Zipkin
]))

The composite propagator tries each format when extracting and writes all formats when injecting.

Auto-instrumentation deep dive

How auto-instrumentation works

opentelemetry-instrumentation-fastapi monkey-patches Starlette’s ASGI handling to:

  1. Extract trace context from incoming request headers.
  2. Create a SERVER span with HTTP attributes (http.method, http.url, http.status_code).
  3. Inject trace context into the response for downstream correlation.
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

app = FastAPI()
FastAPIInstrumentor.instrument_app(app)

Custom span attributes via hooks

def request_hook(span, scope):
    if scope.get("type") == "http":
        # Add custom attributes from headers
        headers = dict(scope.get("headers", []))
        tenant = headers.get(b"x-tenant-id", b"").decode()
        if tenant:
            span.set_attribute("tenant.id", tenant)

def response_hook(span, status, response_headers):
    cache_status = dict(response_headers).get("x-cache", "miss")
    span.set_attribute("cache.status", cache_status)

FastAPIInstrumentor.instrument_app(
    app,
    server_request_hook=request_hook,
    client_response_hook=response_hook
)

Sampling strategies in depth

Probability sampler

from opentelemetry.sdk.trace.sampling import TraceIdRatioBased

sampler = TraceIdRatioBased(0.1)  # sample 10%
provider = TracerProvider(sampler=sampler, resource=resource)

The sampler uses the trace ID’s bits to decide, ensuring all services agree on whether to sample a given trace (consistent sampling).

Parent-based sampler

from opentelemetry.sdk.trace.sampling import ParentBasedTraceIdRatioBased

sampler = ParentBasedTraceIdRatioBased(rate=0.1)

If the incoming request already has a sampling decision (in traceparent flags), respect it. Otherwise, apply the ratio. This prevents broken traces where some services sample and others don’t.

Custom sampler for priority traces

from opentelemetry.sdk.trace.sampling import Sampler, SamplingResult, Decision

class PrioritySampler(Sampler):
    def __init__(self, default_rate=0.1):
        self.default_rate = default_rate
        self._ratio_sampler = TraceIdRatioBased(default_rate)

    def should_sample(self, parent_context, trace_id, name, kind, attributes, links):
        # Always trace errors and specific endpoints
        if attributes and attributes.get("http.target", "").startswith("/admin"):
            return SamplingResult(Decision.RECORD_AND_SAMPLE)

        return self._ratio_sampler.should_sample(
            parent_context, trace_id, name, kind, attributes, links
        )

    def get_description(self):
        return f"PrioritySampler(default_rate={self.default_rate})"

Tail-based sampling with the OTel Collector

Head-based sampling decides before the request executes — you might miss interesting traces. Tail-based sampling waits until the trace completes:

# otel-collector-config.yaml
processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: slow
        type: latency
        latency: { threshold_ms: 2000 }
      - name: baseline
        type: probabilistic
        probabilistic: { sampling_percentage: 5 }

The application sends all spans to the collector, which buffers them and applies policies after the trace completes.

Production deployment architecture

┌──────────┐     ┌──────────┐     ┌──────────┐
│ Service A │────▶│ Service B │────▶│ Service C │
└────┬─────┘     └────┬─────┘     └────┬─────┘
     │ OTLP           │ OTLP           │ OTLP
     ▼                ▼                ▼
┌─────────────────────────────────────────────┐
│         OpenTelemetry Collector              │
│  (tail sampling, attribute enrichment)       │
└──────────────┬──────────────────┬───────────┘
               │                  │
          ┌────▼────┐     ┌──────▼──────┐
          │  Tempo   │     │ Elasticsearch│
          │ (traces) │     │  (logs)      │
          └────┬────┘     └─────────────┘

          ┌────▼────┐
          │ Grafana  │
          │(dashboards)│
          └─────────┘

Key decisions:

  • Sidecar vs. centralized collector: Sidecars (one per pod) reduce network hops but increase resource usage. Centralized collectors are simpler but create a single point of failure.
  • OTLP protocol: Use gRPC for lower overhead, HTTP/protobuf for firewall-friendly environments.
  • Batching: The SDK’s BatchSpanProcessor reduces export overhead. Default batch size (512) and delay (5s) work for most services. High-throughput services may need max_queue_size=8192.

Correlating traces with logs

Include the trace ID in every log line:

import logging
from opentelemetry import trace

class TraceLogFilter(logging.Filter):
    def filter(self, record):
        span = trace.get_current_span()
        ctx = span.get_span_context()
        if ctx.is_valid:
            record.trace_id = format(ctx.trace_id, "032x")
            record.span_id = format(ctx.span_id, "016x")
        else:
            record.trace_id = "0" * 32
            record.span_id = "0" * 16
        return True

In Grafana, this lets you click a trace span and jump directly to the relevant log lines — and vice versa.

Performance overhead

Measured on Python 3.12 with FastAPI, 1000 requests/sec:

Configurationp50 overheadp99 overheadMemory
No tracingbaselinebaselinebaseline
Auto-instrumentation, no export+0.2ms+0.8ms+12 MB
+ BatchSpanProcessor, OTLP export+0.3ms+1.2ms+25 MB
+ 5 additional library instrumentations+0.5ms+2.0ms+35 MB

The overhead is dominated by attribute serialization and context propagation, not network I/O (which happens in background threads).

One thing to remember: Request tracing is infrastructure, not a feature. Invest in auto-instrumentation, sampling, and collector deployment once, and every future debugging session becomes a visual exercise instead of a log-grep marathon.

pythonobservabilitydistributed-systemsopentelemetry

See Also

  • Python Alerting Patterns Alerting is a smoke detector for your code — it wakes you up when something is burning, not when someone is cooking.
  • Python Correlation Ids Correlation IDs are name tags for requests — they let you follow one visitor's journey through a crowded theme park of services.
  • Python Grafana Dashboards Python Grafana turns boring numbers from your Python app into colorful, real-time dashboards — like a car's dashboard but for your code.
  • Python Log Aggregation Elk ELK collects scattered log files from all your services into one searchable place — like gathering every sticky note in the office into a single filing cabinet.
  • Python Logging Best Practices Treat logs like a flight recorder so you can understand failures after they happen, not just during development.