Distributed Tracing with OpenTelemetry in Python — Deep Dive
Getting OpenTelemetry traces working in a demo takes an afternoon. Making them useful in production — where sampling decisions affect costs, context propagation crosses async boundaries, and instrumentation must not degrade latency — takes deliberate engineering.
SDK architecture
The Python OpenTelemetry SDK has a layered design:
- API layer (
opentelemetry-api): Defines interfaces. Application code imports only this. - SDK layer (
opentelemetry-sdk): Implements the API. Configures providers, processors, and exporters. - Instrumentation libraries: Automatically wrap frameworks and libraries.
- Exporters: Send data to backends via OTLP, Jaeger, Zipkin, or custom protocols.
This separation means library authors can instrument their code against the API without forcing SDK dependencies on users.
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.semconv.resource import ResourceAttributes
resource = Resource.create({
ResourceAttributes.SERVICE_NAME: "order-service",
ResourceAttributes.SERVICE_VERSION: "2.4.1",
ResourceAttributes.DEPLOYMENT_ENVIRONMENT: "production",
})
provider = TracerProvider(resource=resource)
provider.add_span_processor(
BatchSpanProcessor(
OTLPSpanExporter(endpoint="http://otel-collector:4317"),
max_queue_size=2048,
max_export_batch_size=512,
schedule_delay_millis=5000,
)
)
trace.set_tracer_provider(provider)
The Resource attaches metadata to every span. The BatchSpanProcessor buffers spans and exports them in batches, reducing network overhead.
Context propagation in depth
W3C Trace Context
The default propagator uses W3C traceparent and tracestate headers:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
version-trace_id-parent_id-flags
The tracestate header carries vendor-specific data. If you use multiple tracing systems during migration, both can coexist via tracestate.
Propagation across async boundaries
In asyncio applications, context is stored in contextvars and flows automatically through await chains. However, manually spawned tasks need explicit context propagation:
import asyncio
from opentelemetry import context
async def background_work():
# This runs in the correct trace context
with tracer.start_as_current_span("background"):
await do_work()
# Capture current context before spawning
ctx = context.get_current()
# Propagate context to the new task
task = asyncio.create_task(
context.attach(ctx) or background_work()
)
For thread pools, use opentelemetry.context.attach explicitly or use the opentelemetry-instrumentation-threading package.
Propagation through message queues
When publishing to Kafka, RabbitMQ, or NATS, inject trace context into message headers:
from opentelemetry.propagators import inject
headers = {}
inject(headers)
# headers now contains {"traceparent": "00-...", "tracestate": "..."}
# Include these headers in your message
On the consumer side:
from opentelemetry.propagators import extract
ctx = extract(carrier=message.headers)
with tracer.start_as_current_span("process_message", context=ctx):
handle(message)
This creates a causal chain from producer to consumer spans, even across different services and languages.
Span links and events
Span links
When a span is causally related to another span but is not a direct child, use links:
# Batch processor that handles messages from multiple traces
link1 = trace.Link(msg1_span_context)
link2 = trace.Link(msg2_span_context)
with tracer.start_as_current_span("process_batch", links=[link1, link2]):
process([msg1, msg2])
Links are useful for batch operations, fan-in patterns, and retry relationships where the new attempt relates to the original but is not a child.
Span events
Events are timestamped annotations within a span:
with tracer.start_as_current_span("checkout") as span:
span.add_event("inventory_checked", {"items": 3})
# ... processing ...
span.add_event("payment_authorized", {"amount": 42.50})
Events appear as markers on the span timeline. They are lighter than child spans when you want to annotate without creating new timing units.
Baggage
Baggage propagates key-value pairs across all services in a trace without adding them to every span:
from opentelemetry import baggage
ctx = baggage.set_baggage("tenant.id", "acme-corp")
# All downstream services can read this
tenant = baggage.get_baggage("tenant.id")
Use baggage sparingly — it adds to every outgoing request header. Good for tenant ID, experiment cohort, or priority level. Bad for large payloads.
Sampling strategies
Head-based sampling
Decided at trace creation. The TraceIdRatioBased sampler is the simplest:
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
sampler = TraceIdRatioBased(0.1) # Sample 10% of traces
provider = TracerProvider(sampler=sampler, resource=resource)
The ParentBased sampler respects the parent’s sampling decision, ensuring consistency across services:
from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased
sampler = ParentBased(root=TraceIdRatioBased(0.1))
Tail-based sampling
Head-based sampling misses interesting traces (errors, high latency) that happen to fall in the unsampled 90%. Tail-based sampling defers the decision until the trace is complete.
This is implemented in the OpenTelemetry Collector, not in the application:
# Collector config
processors:
tail_sampling:
decision_wait: 10s
policies:
- name: error-traces
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow-traces
type: latency
latency: { threshold_ms: 1000 }
- name: baseline
type: probabilistic
probabilistic: { sampling_percentage: 5 }
This keeps all error traces, all traces over 1 second, and 5% of everything else. The collector buffers spans until the decision wait expires.
Instrumentation for specific frameworks
FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
app = FastAPI()
FastAPIInstrumentor.instrument_app(app)
This creates spans for each route, includes HTTP method and status code attributes, and propagates context to downstream calls.
SQLAlchemy
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
SQLAlchemyInstrumentor().instrument(engine=engine)
Every SQL query becomes a span with db.statement, db.system, and timing information. Slow queries become immediately visible in traces.
Celery
from opentelemetry.instrumentation.celery import CeleryInstrumentor
CeleryInstrumentor().instrument()
Task enqueue creates a producer span; task execution creates a consumer span. The trace flows from the web request through the task queue to the worker.
Performance considerations
OpenTelemetry overhead in Python is measurable but manageable:
- Span creation: ~1-5 μs per span (without export)
- Context propagation: ~0.5 μs per inject/extract
- BatchSpanProcessor: Exports asynchronously, minimal impact on request latency
- Memory: Each buffered span uses roughly 1-2 KB
To minimize impact:
- Use
BatchSpanProcessor(notSimpleSpanProcessor) in production. - Set reasonable
max_queue_size— if the queue fills, new spans are dropped. - Sample aggressively in high-throughput services (1-10%).
- Avoid adding large attributes to spans — they increase memory and export size.
- Use the
OTEL_TRACES_SAMPLERenvironment variable for runtime sampling changes without code deploys.
Correlating traces with logs
Inject trace context into log records for trace-log correlation:
import logging
class TraceContextFilter(logging.Filter):
def filter(self, record):
span = trace.get_current_span()
ctx = span.get_span_context()
record.trace_id = format(ctx.trace_id, "032x")
record.span_id = format(ctx.span_id, "016x")
return True
handler = logging.StreamHandler()
handler.addFilter(TraceContextFilter())
handler.setFormatter(logging.Formatter(
"%(asctime)s [trace=%(trace_id)s span=%(span_id)s] %(message)s"
))
With trace IDs in logs, you can jump from a log line in Grafana directly to the full trace in Tempo or Jaeger.
One thing to remember: Production-grade OpenTelemetry requires tail-based sampling in the collector, explicit context propagation across async and message-queue boundaries, and trace-log correlation — the auto-instrumentors get you started, but these details make traces genuinely useful for debugging.
See Also
- Python Adaptive Learning Systems How Python builds learning apps that adjust to each student like a personal tutor who knows exactly what you need next.
- Python Airflow Learn Airflow as a timetable manager that makes sure data tasks run in the right order every day.
- Python Altair Learn Altair through the idea of drawing charts by describing rules, not by hand-placing every visual element.
- Python Automated Grading How Python grades homework and exams automatically, from simple answer keys to understanding written essays.
- Python Batch Vs Stream Processing Batch processing is like doing laundry once a week; stream processing is like a self-cleaning shirt that cleans itself constantly.