Python Faust Stream Processing — Deep Dive
Production Application Setup
The actively maintained faust-streaming fork (installed as pip install faust-streaming) tracks modern Python and Kafka versions. A production application typically looks like this:
import faust
app = faust.App(
"order-processor",
broker="kafka://broker1:9092;kafka://broker2:9092",
store="rocksdb://", # Local state store
topic_replication_factor=3,
stream_buffer_maxsize=4096,
consumer_auto_offset_reset="earliest",
)
class Order(faust.Record, serializer="json"):
order_id: str
user_id: str
amount: float
currency: str
orders_topic = app.topic("raw-orders", value_type=Order)
enriched_topic = app.topic("enriched-orders", value_type=Order)
The store="rocksdb://" directive uses RocksDB for local table state, which is faster than the default in-memory store for large tables. Install it with pip install python-rocksdb.
Agent Patterns
Fan-out Processing
A single agent can produce to multiple output topics based on event content:
@app.agent(orders_topic)
async def route_orders(orders):
async for order in orders:
if order.currency == "USD":
await domestic_topic.send(value=order)
else:
await international_topic.send(value=order)
Grouped Aggregation with Tables
order_counts = app.Table(
"order-counts",
default=int,
partitions=8,
).tumbling(3600, expires=timedelta(hours=24))
@app.agent(orders_topic)
async def count_orders(orders):
async for order in orders:
order_counts[order.user_id] += 1
The tumbling window creates hourly buckets. The expires parameter tells Faust to discard windows older than 24 hours, preventing unbounded state growth.
Joins Between Streams and Tables
user_profiles = app.Table("user-profiles", value_type=UserProfile)
@app.agent(orders_topic)
async def enrich_orders(orders):
async for order in orders:
profile = user_profiles.get(order.user_id)
if profile:
enriched = {**order.asdict(), "tier": profile.tier}
await enriched_topic.send(value=enriched)
This pattern joins a stream against a table populated by a separate agent consuming a user-profiles topic.
Changelog Recovery and Standby Replicas
Every Faust table is backed by a Kafka changelog topic. When a worker starts (or when partitions are reassigned), Faust replays the changelog to rebuild local state. For large tables, this recovery can take minutes.
Standby replicas mitigate this. By setting table_standby_replicas=1, idle workers maintain a shadow copy of another worker’s table. When a partition moves, the new owner already has the data.
app = faust.App(
"order-processor",
broker="kafka://broker1:9092",
store="rocksdb://",
table_standby_replicas=1,
)
Custom Partition Assignors
Faust’s default assignor distributes partitions evenly, but rebalances can be disruptive. The ConsumerMemberAssignment protocol supports sticky assignment strategies that minimize partition movement:
from faust.assignor import LeaderAssignor
app.conf.ConsumerScheduler = LeaderAssignor
For production clusters, sticky assignment reduces recovery time because fewer tables need rebuilding after scaling events.
Web Views and Monitoring
Faust embeds an aiohttp web server, enabling HTTP endpoints directly in your stream processor:
@app.page("/api/order-count/{user_id}")
async def get_order_count(web, request, user_id):
count = order_counts[user_id].current()
return web.json({"user_id": user_id, "count": count})
This is useful for exposing table state as an API — dashboards can query the stream processor directly rather than routing through a separate database.
Health endpoints (/health is built in) integrate with Kubernetes liveness probes.
Testing Agents
Faust provides a test driver that replaces Kafka with in-memory channels:
import pytest
@pytest.fixture
def test_app(event_loop):
app.finalize()
app.conf.store = "memory://"
return app
@pytest.mark.asyncio
async def test_order_routing(test_app):
async with route_orders.test_context() as agent:
order = Order(order_id="1", user_id="u1", amount=50, currency="EUR")
await agent.put(order)
# Assert message appeared on international_topic
The test context intercepts all topic sends, allowing assertions without a running Kafka cluster.
Scaling Strategies
Faust workers scale horizontally. Each worker joins the same consumer group, and Kafka reassigns partitions. Key considerations:
- Match worker count to partition count — having more workers than partitions wastes resources since extra workers sit idle.
- Pin CPU-bound work to thread executors — Faust agents run on the asyncio loop. CPU-heavy transforms should use
loop.run_in_executor()to avoid blocking the event loop. - Monitor consumer lag — Buraq, Kafka’s consumer lag metric, tells you whether workers keep up with incoming events. Alert when lag exceeds a threshold for more than a few minutes.
- Separate read-heavy and write-heavy agents — if one agent aggregates state and another just routes, deploy them as separate Faust apps so scaling decisions are independent.
Performance Characteristics
On a 3-broker Kafka cluster with 12 partitions, a single Faust worker processes roughly 15,000–30,000 JSON events per second depending on event size and processing complexity. Switching to Avro serialization and RocksDB state stores improves throughput by approximately 20% due to smaller message sizes and faster disk-backed lookups.
For workloads exceeding 100K events per second, consider whether Faust’s single-threaded asyncio model is sufficient or whether a JVM-based framework provides better raw throughput.
Error Handling and Dead Letters
Faust does not have built-in dead-letter topic support. Implement it manually by wrapping agent logic in try/except blocks and forwarding failed events to a dedicated error topic. Track retry counts in a table keyed by event ID to avoid infinite loops. After a configurable number of retries, publish the event to a dead-letter topic and alert the operations team.
Production Checklist
- Use
faust-streamingfork, not the archived originalfaustpackage. - Set
topic_replication_factor >= 3for changelog topics. - Enable
table_standby_replicasfor fast failover. - Configure dead-letter topics for events that fail processing after retries.
- Export metrics via the built-in StatsD or Prometheus integrations.
- Run at least two workers per application for high availability.
The one thing to remember: Faust shines when you need Kafka stream processing in Python with stateful tables and windowed aggregations — but production success depends on understanding changelog recovery, partition assignment, and the asyncio concurrency model.
See Also
- Python Change Data Capture How Python watches database changes like a security camera, catching every insert, update, and delete the moment it happens.
- Python Kafka Consumers Understand Python Kafka consumers as organized listeners that read event streams without losing place in the line.
- Python Kafka Producers How Python programs send millions of messages into Kafka like a postal sorting machine that never sleeps.
- Python Pulsar Messaging Why Apache Pulsar is like a super-powered mailroom that handles both quick notes and huge packages for Python applications.
- Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.