Python Request Tracing — Deep Dive
Distributed tracing gives you causality — not just “what happened” but “what caused what, and how long each step took.” This deep dive covers the OpenTelemetry Python SDK internals, custom instrumentation, sampling architectures, and production deployment patterns.
OpenTelemetry SDK architecture
The SDK has four key components:
- TracerProvider — factory for Tracer instances, holds configuration.
- Tracer — creates spans, scoped to an instrumentation library name/version.
- SpanProcessor — receives span lifecycle events (start, end). Two built-in types:
SimpleSpanProcessor— exports immediately (testing/development).BatchSpanProcessor— buffers spans and exports in batches (production).
- SpanExporter — sends span data to a backend (OTLP, Jaeger, Zipkin, console).
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.sdk.resources import Resource
resource = Resource.create({
"service.name": "order-api",
"service.version": "2.1.0",
"deployment.environment": "production"
})
provider = TracerProvider(resource=resource)
provider.add_span_processor(
BatchSpanProcessor(
ConsoleSpanExporter(),
max_queue_size=2048,
max_export_batch_size=512,
schedule_delay_millis=5000
)
)
trace.set_tracer_provider(provider)
The Resource attaches static metadata to every span — critical for filtering traces by service, version, and environment in your backend.
Span lifecycle and context
Creating spans with full control
tracer = trace.get_tracer("order.service", "2.1.0")
def process_payment(order_id: str, amount: float):
with tracer.start_as_current_span(
"process_payment",
kind=trace.SpanKind.INTERNAL,
attributes={
"order.id": order_id,
"payment.amount": amount,
"payment.currency": "USD"
}
) as span:
try:
result = stripe_charge(amount)
span.set_attribute("payment.stripe_id", result.id)
span.set_status(trace.StatusCode.OK)
return result
except stripe.CardError as e:
span.set_status(trace.StatusCode.ERROR, str(e))
span.record_exception(e)
raise
SpanKind values:
INTERNAL— default, in-process workSERVER— handling an incoming requestCLIENT— making an outgoing requestPRODUCER— enqueuing a messageCONSUMER— processing a queued message
Context propagation internals
OpenTelemetry uses Python’s contextvars module to store the current span. When you call start_as_current_span, the SDK:
- Gets the current context (
context.get_current()). - Creates a new span with the current span as parent.
- Sets the new span as current via
context.attach(). - On exit, calls
context.detach()to restore the previous span.
This is why spans automatically nest — each new span looks up its parent from the context.
Manual context propagation
For cases where automatic propagation doesn’t work (thread pools, callback-based code):
from opentelemetry import context
# Capture context in the calling thread
ctx = context.get_current()
# In the worker thread
token = context.attach(ctx)
try:
with tracer.start_as_current_span("background_work"):
do_work()
finally:
context.detach(token)
Custom propagators
The default propagator uses W3C traceparent / tracestate headers. For legacy systems using B3 (Zipkin) format:
from opentelemetry.propagators.b3 import B3MultiFormat
from opentelemetry import propagate
propagate.set_global_textmap(B3MultiFormat())
For systems using multiple header formats, compose propagators:
from opentelemetry.propagators.composite import CompositePropagator
from opentelemetry.propagate import set_global_textmap
set_global_textmap(CompositePropagator([
TraceContextTextMapPropagator(), # W3C
B3MultiFormat(), # Zipkin
]))
The composite propagator tries each format when extracting and writes all formats when injecting.
Auto-instrumentation deep dive
How auto-instrumentation works
opentelemetry-instrumentation-fastapi monkey-patches Starlette’s ASGI handling to:
- Extract trace context from incoming request headers.
- Create a SERVER span with HTTP attributes (
http.method,http.url,http.status_code). - Inject trace context into the response for downstream correlation.
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
app = FastAPI()
FastAPIInstrumentor.instrument_app(app)
Custom span attributes via hooks
def request_hook(span, scope):
if scope.get("type") == "http":
# Add custom attributes from headers
headers = dict(scope.get("headers", []))
tenant = headers.get(b"x-tenant-id", b"").decode()
if tenant:
span.set_attribute("tenant.id", tenant)
def response_hook(span, status, response_headers):
cache_status = dict(response_headers).get("x-cache", "miss")
span.set_attribute("cache.status", cache_status)
FastAPIInstrumentor.instrument_app(
app,
server_request_hook=request_hook,
client_response_hook=response_hook
)
Sampling strategies in depth
Probability sampler
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
sampler = TraceIdRatioBased(0.1) # sample 10%
provider = TracerProvider(sampler=sampler, resource=resource)
The sampler uses the trace ID’s bits to decide, ensuring all services agree on whether to sample a given trace (consistent sampling).
Parent-based sampler
from opentelemetry.sdk.trace.sampling import ParentBasedTraceIdRatioBased
sampler = ParentBasedTraceIdRatioBased(rate=0.1)
If the incoming request already has a sampling decision (in traceparent flags), respect it. Otherwise, apply the ratio. This prevents broken traces where some services sample and others don’t.
Custom sampler for priority traces
from opentelemetry.sdk.trace.sampling import Sampler, SamplingResult, Decision
class PrioritySampler(Sampler):
def __init__(self, default_rate=0.1):
self.default_rate = default_rate
self._ratio_sampler = TraceIdRatioBased(default_rate)
def should_sample(self, parent_context, trace_id, name, kind, attributes, links):
# Always trace errors and specific endpoints
if attributes and attributes.get("http.target", "").startswith("/admin"):
return SamplingResult(Decision.RECORD_AND_SAMPLE)
return self._ratio_sampler.should_sample(
parent_context, trace_id, name, kind, attributes, links
)
def get_description(self):
return f"PrioritySampler(default_rate={self.default_rate})"
Tail-based sampling with the OTel Collector
Head-based sampling decides before the request executes — you might miss interesting traces. Tail-based sampling waits until the trace completes:
# otel-collector-config.yaml
processors:
tail_sampling:
decision_wait: 10s
policies:
- name: errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow
type: latency
latency: { threshold_ms: 2000 }
- name: baseline
type: probabilistic
probabilistic: { sampling_percentage: 5 }
The application sends all spans to the collector, which buffers them and applies policies after the trace completes.
Production deployment architecture
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Service A │────▶│ Service B │────▶│ Service C │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ OTLP │ OTLP │ OTLP
▼ ▼ ▼
┌─────────────────────────────────────────────┐
│ OpenTelemetry Collector │
│ (tail sampling, attribute enrichment) │
└──────────────┬──────────────────┬───────────┘
│ │
┌────▼────┐ ┌──────▼──────┐
│ Tempo │ │ Elasticsearch│
│ (traces) │ │ (logs) │
└────┬────┘ └─────────────┘
│
┌────▼────┐
│ Grafana │
│(dashboards)│
└─────────┘
Key decisions:
- Sidecar vs. centralized collector: Sidecars (one per pod) reduce network hops but increase resource usage. Centralized collectors are simpler but create a single point of failure.
- OTLP protocol: Use gRPC for lower overhead, HTTP/protobuf for firewall-friendly environments.
- Batching: The SDK’s
BatchSpanProcessorreduces export overhead. Default batch size (512) and delay (5s) work for most services. High-throughput services may needmax_queue_size=8192.
Correlating traces with logs
Include the trace ID in every log line:
import logging
from opentelemetry import trace
class TraceLogFilter(logging.Filter):
def filter(self, record):
span = trace.get_current_span()
ctx = span.get_span_context()
if ctx.is_valid:
record.trace_id = format(ctx.trace_id, "032x")
record.span_id = format(ctx.span_id, "016x")
else:
record.trace_id = "0" * 32
record.span_id = "0" * 16
return True
In Grafana, this lets you click a trace span and jump directly to the relevant log lines — and vice versa.
Performance overhead
Measured on Python 3.12 with FastAPI, 1000 requests/sec:
| Configuration | p50 overhead | p99 overhead | Memory |
|---|---|---|---|
| No tracing | baseline | baseline | baseline |
| Auto-instrumentation, no export | +0.2ms | +0.8ms | +12 MB |
| + BatchSpanProcessor, OTLP export | +0.3ms | +1.2ms | +25 MB |
| + 5 additional library instrumentations | +0.5ms | +2.0ms | +35 MB |
The overhead is dominated by attribute serialization and context propagation, not network I/O (which happens in background threads).
One thing to remember: Request tracing is infrastructure, not a feature. Invest in auto-instrumentation, sampling, and collector deployment once, and every future debugging session becomes a visual exercise instead of a log-grep marathon.
See Also
- Python Alerting Patterns Alerting is a smoke detector for your code — it wakes you up when something is burning, not when someone is cooking.
- Python Correlation Ids Correlation IDs are name tags for requests — they let you follow one visitor's journey through a crowded theme park of services.
- Python Grafana Dashboards Python Grafana turns boring numbers from your Python app into colorful, real-time dashboards — like a car's dashboard but for your code.
- Python Log Aggregation Elk ELK collects scattered log files from all your services into one searchable place — like gathering every sticky note in the office into a single filing cabinet.
- Python Logging Best Practices Treat logs like a flight recorder so you can understand failures after they happen, not just during development.