Python Pulsar Messaging — Core Concepts
What Makes Pulsar Different
Apache Pulsar is a distributed messaging and streaming platform. Unlike Kafka, which tightly couples brokers and storage, Pulsar separates the serving layer (brokers) from the storage layer (Apache BookKeeper). This separation enables independent scaling: you add brokers to handle more connections, or add BookKeeper nodes to store more data, without affecting the other.
Pulsar also provides native multi-tenancy. A single cluster supports multiple tenants, each with their own namespaces and topics, complete with per-tenant resource quotas and access controls. This makes it attractive for organizations where multiple teams share infrastructure.
The Python Client
The official pulsar-client library is a C++ wrapper that provides near-native performance:
import pulsar
client = pulsar.Client("pulsar://localhost:6650")
producer = client.create_producer("persistent://public/default/events")
producer.send(b"hello from python")
client.close()
The persistent:// prefix means messages are durably stored. Pulsar also supports non-persistent:// topics for fire-and-forget use cases where speed matters more than durability.
Subscription Modes
This is where Pulsar shines for different consumption patterns:
- Exclusive — only one consumer can attach. Simple and ordered.
- Shared — multiple consumers receive messages round-robin. Great for scaling worker pools.
- Failover — one active consumer with a standby. If the active one disconnects, the standby takes over seamlessly.
- Key_Shared — messages with the same key always go to the same consumer, preserving per-key ordering while still distributing load.
Kafka achieves some of these through partitioning, but Pulsar’s model is more flexible because subscription modes are independent of topic partitions.
Message Acknowledgment
Consumers acknowledge messages individually or cumulatively. Individual acknowledgment marks specific messages as processed, allowing out-of-order processing. Cumulative acknowledgment marks everything up to a position as done, similar to Kafka’s offset commit.
Unacknowledged messages are redelivered after a configurable timeout, providing at-least-once delivery guarantees without manual retry logic.
Tiered Storage
Pulsar automatically offloads older data from BookKeeper to cheaper storage like S3, GCS, or HDFS. This means topics can retain months or years of data without expensive SSD storage for every byte.
From Python, this is transparent. A consumer seeking to an old timestamp reads seamlessly across tiers — Pulsar handles fetching from the right storage backend.
Schema Registry
Pulsar includes a built-in schema registry. When creating a producer, you declare the schema:
from pulsar.schema import Record, String, Float
class Order(Record):
user_id = String()
amount = Float()
producer = client.create_producer(
"persistent://public/default/orders",
schema=pulsar.schema.AvroSchema(Order),
)
The broker enforces schema compatibility. If a producer tries to send data that does not match the registered schema, the send fails — catching contract violations before they cause downstream errors.
Common Misconception
Many assume Pulsar is just “Kafka with extra features.” While both handle streaming, Pulsar’s architecture is fundamentally different. The decoupled storage model means Pulsar brokers are stateless — they can crash and recover without data loss and without the complex partition rebalancing that Kafka requires. The tradeoff is operational complexity: running BookKeeper alongside brokers means more components to manage.
The one thing to remember: Pulsar’s separated compute-and-storage architecture gives Python applications flexible subscription modes and transparent tiered storage — but this flexibility comes with a more complex operational footprint than Kafka.
See Also
- Python Change Data Capture How Python watches database changes like a security camera, catching every insert, update, and delete the moment it happens.
- Python Faust Stream Processing How Faust lets Python programs process endless rivers of data in real time, like a factory assembly line that never stops.
- Python Kafka Consumers Understand Python Kafka consumers as organized listeners that read event streams without losing place in the line.
- Python Kafka Producers How Python programs send millions of messages into Kafka like a postal sorting machine that never sleeps.
- Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.