NATS Messaging in Python — Deep Dive

Design production NATS architectures in Python with JetStream consumers, exactly-once processing, cluster topologies, and backpressure strategies.

NATS occupies a unique position in the messaging landscape. Its core protocol is text-based and fits on a napkin, yet the system scales to handle the internal messaging of companies like Synadia, Netlify, and major telecoms. For Python services, NATS provides a single substrate for request-reply RPCs, event broadcasting, and durable stream processing — all through one connection.

Connection management

The nats-py client uses asyncio and supports connection pooling to NATS clusters:

import nats

nc = await nats.connect(
    servers=["nats://node1:4222", "nats://node2:4222", "nats://node3:4222"],
    max_reconnect_attempts=60,
    reconnect_time_wait=2,  # seconds between attempts
    error_cb=on_error,
    disconnected_cb=on_disconnect,
    reconnected_cb=on_reconnect,
)

The client automatically fails over to another server if the current one goes down. During reconnection, publishes are buffered up to pending_size (default 2 MB). If the buffer fills, publishes raise an exception — this is your backpressure signal.

Handling slow consumers

NATS server tracks per-subscription pending message counts. If a subscriber falls behind, the server sends a Slow Consumer notification and may drop messages for that subscription. Handle this explicitly:

async def error_handler(err):
    if "slow consumer" in str(err).lower():
        logger.warning("Falling behind on message processing")
        # Trigger scale-up or shed load

JetStream deep dive

Stream configuration

Streams are created with policies that control retention, replication, and storage:

js = nc.jetstream()

await js.add_stream(
    name="ORDERS",
    subjects=["orders.>"],
    retention="limits",       # or "interest" or "workqueue"
    max_msgs=1_000_000,
    max_bytes=1 * 1024**3,    # 1 GB
    max_age=86400 * 7,        # 7 days in seconds
    num_replicas=3,
    storage="file",           # or "memory"
    duplicate_window=120,     # dedup window in seconds
)

Retention policies:

limits: Keep messages until limits (count, bytes, age) are hit, then discard oldest.
interest: Keep messages only while at least one consumer exists. Once all consumers acknowledge, the message is removed.
workqueue: Each message is delivered to exactly one consumer and removed after acknowledgement. Acts like a traditional task queue.

Consumer patterns

Pull consumers give the application control over when to fetch messages:

consumer = await js.pull_subscribe("orders.created", durable="order-processor")

while True:
    try:
        msgs = await consumer.fetch(batch=10, timeout=5.0)
        for msg in msgs:
            await process_order(msg.data)
            await msg.ack()
    except nats.errors.TimeoutError:
        continue  # No messages available

Push consumers deliver messages as they arrive. They are simpler but harder to manage backpressure:

async def handler(msg):
    await process_order(msg.data)
    await msg.ack()

await js.subscribe("orders.created", durable="order-worker", cb=handler)

Exactly-once semantics

JetStream supports idempotent publishing with message deduplication:

ack = await js.publish(
    "orders.created",
    data,
    headers={"Nats-Msg-Id": f"order-{order_id}"}
)

Within the stream’s duplicate_window, any publish with the same Nats-Msg-Id returns a successful ack without storing a duplicate. Combined with consumer acknowledgement, this achieves effectively-once processing.

Request-reply patterns at scale

Scatter-gather

Publish a request and collect multiple replies within a timeout. Useful for querying multiple services and aggregating results:

inbox = nc.new_inbox()
sub = await nc.subscribe(inbox)

await nc.publish_request("inventory.check", inbox, b'{"sku": "ABC123"}')

responses = []
deadline = asyncio.get_event_loop().time() + 2.0
while asyncio.get_event_loop().time() < deadline:
    try:
        msg = await sub.next_msg(timeout=0.5)
        responses.append(msg.data)
    except nats.errors.TimeoutError:
        break

Service discovery via subjects

NATS subjects can serve as a lightweight service registry. Services subscribe to api.{service_name}.{method}; clients publish requests to those subjects. Queue groups ensure only one instance handles each request. No Consul or etcd needed for simple topologies.

Cluster and super-cluster topologies

Basic cluster

Three or more NATS servers form a full-mesh cluster. Clients connect to any node; the cluster routes messages internally. JetStream replicates stream data across the configured number of replicas using Raft consensus.

Leaf nodes

Edge locations run leaf node servers that connect to the central cluster. Leaf nodes forward only messages that match local subscriptions, reducing bandwidth. A Python service at an edge site connects to its local leaf node and transparently reaches the full cluster.

Super-clusters with gateways

Multiple clusters connect via gateway connections for multi-region architectures. Interest-based routing ensures messages only cross gateways when a subscriber exists in the remote cluster. This keeps inter-region traffic minimal.

Security model

NATS supports three auth methods:

Token/password: Simple but static.
NKeys: Ed25519 keypairs. No shared secrets on the wire.
JWT-based decentralized auth: Accounts get signed JWTs that define which subjects they can publish/subscribe to. The server validates JWTs without calling an external auth service.

# NKey authentication
nc = await nats.connect(
    "nats://secure.example.com:4222",
    nkeys_seed="./user.nk"
)

The JWT model is particularly powerful for multi-tenant platforms. Each tenant gets an account JWT with subject permissions. Accounts are isolated — a message published in one account never leaks to another.

Monitoring and observability

NATS exposes a monitoring endpoint on port 8222:

GET /varz     → server stats
GET /connz    → connection details
GET /subsz    → subscription info
GET /jsz      → JetStream status

Export these to Prometheus using nats-exporter:

# Useful alerts
- Consumer pending messages > threshold (processing lag)
- Stream bytes approaching max_bytes (retention pressure)
- Slow consumer count increasing (scale-up signal)
- Connection count spikes (possible reconnection storm)

Performance characteristics

Benchmarks on modern hardware (2024, single NATS server, 8 cores):

Core NATS pub/sub: ~15 million msgs/sec for small messages
JetStream publish: ~1 million msgs/sec (file storage)
JetStream publish: ~3 million msgs/sec (memory storage)
Request-reply round-trip: ~100 μs local, ~1-3 ms cross-region

The Python client adds overhead compared to Go clients (asyncio event loop, GIL). Expect roughly 200K-500K msgs/sec throughput from a single Python process. For higher throughput, run multiple processes with queue group load balancing.

Migration from other systems

Teams moving from RabbitMQ typically map exchanges to NATS subjects and queues to queue groups. The simpler routing model covers 80% of use cases. For the remaining 20% (complex routing rules, priority queues), you implement the logic in application code.

Teams moving from Kafka map topics to JetStream streams and consumer groups to durable consumers. The main adjustment is that NATS does not have partitions — scaling happens through multiple consumers on the same stream, and the server handles distribution.

One thing to remember: NATS with JetStream gives Python services a messaging platform that spans from ephemeral pub/sub to durable streaming, with clustering, security, and exactly-once semantics — all from a single lightweight server binary.

pythonnatsnats-py