NATS Messaging in Python — Deep Dive
NATS occupies a unique position in the messaging landscape. Its core protocol is text-based and fits on a napkin, yet the system scales to handle the internal messaging of companies like Synadia, Netlify, and major telecoms. For Python services, NATS provides a single substrate for request-reply RPCs, event broadcasting, and durable stream processing — all through one connection.
Connection management
The nats-py client uses asyncio and supports connection pooling to NATS clusters:
import nats
nc = await nats.connect(
servers=["nats://node1:4222", "nats://node2:4222", "nats://node3:4222"],
max_reconnect_attempts=60,
reconnect_time_wait=2, # seconds between attempts
error_cb=on_error,
disconnected_cb=on_disconnect,
reconnected_cb=on_reconnect,
)
The client automatically fails over to another server if the current one goes down. During reconnection, publishes are buffered up to pending_size (default 2 MB). If the buffer fills, publishes raise an exception — this is your backpressure signal.
Handling slow consumers
NATS server tracks per-subscription pending message counts. If a subscriber falls behind, the server sends a Slow Consumer notification and may drop messages for that subscription. Handle this explicitly:
async def error_handler(err):
if "slow consumer" in str(err).lower():
logger.warning("Falling behind on message processing")
# Trigger scale-up or shed load
JetStream deep dive
Stream configuration
Streams are created with policies that control retention, replication, and storage:
js = nc.jetstream()
await js.add_stream(
name="ORDERS",
subjects=["orders.>"],
retention="limits", # or "interest" or "workqueue"
max_msgs=1_000_000,
max_bytes=1 * 1024**3, # 1 GB
max_age=86400 * 7, # 7 days in seconds
num_replicas=3,
storage="file", # or "memory"
duplicate_window=120, # dedup window in seconds
)
Retention policies:
- limits: Keep messages until limits (count, bytes, age) are hit, then discard oldest.
- interest: Keep messages only while at least one consumer exists. Once all consumers acknowledge, the message is removed.
- workqueue: Each message is delivered to exactly one consumer and removed after acknowledgement. Acts like a traditional task queue.
Consumer patterns
Pull consumers give the application control over when to fetch messages:
consumer = await js.pull_subscribe("orders.created", durable="order-processor")
while True:
try:
msgs = await consumer.fetch(batch=10, timeout=5.0)
for msg in msgs:
await process_order(msg.data)
await msg.ack()
except nats.errors.TimeoutError:
continue # No messages available
Push consumers deliver messages as they arrive. They are simpler but harder to manage backpressure:
async def handler(msg):
await process_order(msg.data)
await msg.ack()
await js.subscribe("orders.created", durable="order-worker", cb=handler)
Exactly-once semantics
JetStream supports idempotent publishing with message deduplication:
ack = await js.publish(
"orders.created",
data,
headers={"Nats-Msg-Id": f"order-{order_id}"}
)
Within the stream’s duplicate_window, any publish with the same Nats-Msg-Id returns a successful ack without storing a duplicate. Combined with consumer acknowledgement, this achieves effectively-once processing.
Request-reply patterns at scale
Scatter-gather
Publish a request and collect multiple replies within a timeout. Useful for querying multiple services and aggregating results:
inbox = nc.new_inbox()
sub = await nc.subscribe(inbox)
await nc.publish_request("inventory.check", inbox, b'{"sku": "ABC123"}')
responses = []
deadline = asyncio.get_event_loop().time() + 2.0
while asyncio.get_event_loop().time() < deadline:
try:
msg = await sub.next_msg(timeout=0.5)
responses.append(msg.data)
except nats.errors.TimeoutError:
break
Service discovery via subjects
NATS subjects can serve as a lightweight service registry. Services subscribe to api.{service_name}.{method}; clients publish requests to those subjects. Queue groups ensure only one instance handles each request. No Consul or etcd needed for simple topologies.
Cluster and super-cluster topologies
Basic cluster
Three or more NATS servers form a full-mesh cluster. Clients connect to any node; the cluster routes messages internally. JetStream replicates stream data across the configured number of replicas using Raft consensus.
Leaf nodes
Edge locations run leaf node servers that connect to the central cluster. Leaf nodes forward only messages that match local subscriptions, reducing bandwidth. A Python service at an edge site connects to its local leaf node and transparently reaches the full cluster.
Super-clusters with gateways
Multiple clusters connect via gateway connections for multi-region architectures. Interest-based routing ensures messages only cross gateways when a subscriber exists in the remote cluster. This keeps inter-region traffic minimal.
Security model
NATS supports three auth methods:
- Token/password: Simple but static.
- NKeys: Ed25519 keypairs. No shared secrets on the wire.
- JWT-based decentralized auth: Accounts get signed JWTs that define which subjects they can publish/subscribe to. The server validates JWTs without calling an external auth service.
# NKey authentication
nc = await nats.connect(
"nats://secure.example.com:4222",
nkeys_seed="./user.nk"
)
The JWT model is particularly powerful for multi-tenant platforms. Each tenant gets an account JWT with subject permissions. Accounts are isolated — a message published in one account never leaks to another.
Monitoring and observability
NATS exposes a monitoring endpoint on port 8222:
GET /varz → server stats
GET /connz → connection details
GET /subsz → subscription info
GET /jsz → JetStream status
Export these to Prometheus using nats-exporter:
# Useful alerts
- Consumer pending messages > threshold (processing lag)
- Stream bytes approaching max_bytes (retention pressure)
- Slow consumer count increasing (scale-up signal)
- Connection count spikes (possible reconnection storm)
Performance characteristics
Benchmarks on modern hardware (2024, single NATS server, 8 cores):
- Core NATS pub/sub: ~15 million msgs/sec for small messages
- JetStream publish: ~1 million msgs/sec (file storage)
- JetStream publish: ~3 million msgs/sec (memory storage)
- Request-reply round-trip: ~100 μs local, ~1-3 ms cross-region
The Python client adds overhead compared to Go clients (asyncio event loop, GIL). Expect roughly 200K-500K msgs/sec throughput from a single Python process. For higher throughput, run multiple processes with queue group load balancing.
Migration from other systems
Teams moving from RabbitMQ typically map exchanges to NATS subjects and queues to queue groups. The simpler routing model covers 80% of use cases. For the remaining 20% (complex routing rules, priority queues), you implement the logic in application code.
Teams moving from Kafka map topics to JetStream streams and consumer groups to durable consumers. The main adjustment is that NATS does not have partitions — scaling happens through multiple consumers on the same stream, and the server handles distribution.
One thing to remember: NATS with JetStream gives Python services a messaging platform that spans from ephemeral pub/sub to durable streaming, with clustering, security, and exactly-once semantics — all from a single lightweight server binary.
See Also
- Python Adaptive Learning Systems How Python builds learning apps that adjust to each student like a personal tutor who knows exactly what you need next.
- Python Airflow Learn Airflow as a timetable manager that makes sure data tasks run in the right order every day.
- Python Altair Learn Altair through the idea of drawing charts by describing rules, not by hand-placing every visual element.
- Python Automated Grading How Python grades homework and exams automatically, from simple answer keys to understanding written essays.
- Python Batch Vs Stream Processing Batch processing is like doing laundry once a week; stream processing is like a self-cleaning shirt that cleans itself constantly.