Python Faust Stream Processing — Deep Dive

Build production Faust applications: changelog recovery, partition assignors, web views, testing agents, and scaling strategies for real-time Python pipelines.

Production Application Setup

The actively maintained faust-streaming fork (installed as pip install faust-streaming) tracks modern Python and Kafka versions. A production application typically looks like this:

import faust

app = faust.App(
    "order-processor",
    broker="kafka://broker1:9092;kafka://broker2:9092",
    store="rocksdb://",  # Local state store
    topic_replication_factor=3,
    stream_buffer_maxsize=4096,
    consumer_auto_offset_reset="earliest",
)

class Order(faust.Record, serializer="json"):
    order_id: str
    user_id: str
    amount: float
    currency: str

orders_topic = app.topic("raw-orders", value_type=Order)
enriched_topic = app.topic("enriched-orders", value_type=Order)

The store="rocksdb://" directive uses RocksDB for local table state, which is faster than the default in-memory store for large tables. Install it with pip install python-rocksdb.

Agent Patterns

Fan-out Processing

A single agent can produce to multiple output topics based on event content:

@app.agent(orders_topic)
async def route_orders(orders):
    async for order in orders:
        if order.currency == "USD":
            await domestic_topic.send(value=order)
        else:
            await international_topic.send(value=order)

Grouped Aggregation with Tables

order_counts = app.Table(
    "order-counts",
    default=int,
    partitions=8,
).tumbling(3600, expires=timedelta(hours=24))

@app.agent(orders_topic)
async def count_orders(orders):
    async for order in orders:
        order_counts[order.user_id] += 1

The tumbling window creates hourly buckets. The expires parameter tells Faust to discard windows older than 24 hours, preventing unbounded state growth.

Joins Between Streams and Tables

user_profiles = app.Table("user-profiles", value_type=UserProfile)

@app.agent(orders_topic)
async def enrich_orders(orders):
    async for order in orders:
        profile = user_profiles.get(order.user_id)
        if profile:
            enriched = {**order.asdict(), "tier": profile.tier}
            await enriched_topic.send(value=enriched)

This pattern joins a stream against a table populated by a separate agent consuming a user-profiles topic.

Changelog Recovery and Standby Replicas

Every Faust table is backed by a Kafka changelog topic. When a worker starts (or when partitions are reassigned), Faust replays the changelog to rebuild local state. For large tables, this recovery can take minutes.

Standby replicas mitigate this. By setting table_standby_replicas=1, idle workers maintain a shadow copy of another worker’s table. When a partition moves, the new owner already has the data.

app = faust.App(
    "order-processor",
    broker="kafka://broker1:9092",
    store="rocksdb://",
    table_standby_replicas=1,
)

Custom Partition Assignors

Faust’s default assignor distributes partitions evenly, but rebalances can be disruptive. The ConsumerMemberAssignment protocol supports sticky assignment strategies that minimize partition movement:

from faust.assignor import LeaderAssignor

app.conf.ConsumerScheduler = LeaderAssignor

For production clusters, sticky assignment reduces recovery time because fewer tables need rebuilding after scaling events.

Web Views and Monitoring

Faust embeds an aiohttp web server, enabling HTTP endpoints directly in your stream processor:

@app.page("/api/order-count/{user_id}")
async def get_order_count(web, request, user_id):
    count = order_counts[user_id].current()
    return web.json({"user_id": user_id, "count": count})

This is useful for exposing table state as an API — dashboards can query the stream processor directly rather than routing through a separate database.

Health endpoints (/health is built in) integrate with Kubernetes liveness probes.

Testing Agents

Faust provides a test driver that replaces Kafka with in-memory channels:

import pytest

@pytest.fixture
def test_app(event_loop):
    app.finalize()
    app.conf.store = "memory://"
    return app

@pytest.mark.asyncio
async def test_order_routing(test_app):
    async with route_orders.test_context() as agent:
        order = Order(order_id="1", user_id="u1", amount=50, currency="EUR")
        await agent.put(order)
        # Assert message appeared on international_topic

The test context intercepts all topic sends, allowing assertions without a running Kafka cluster.

Scaling Strategies

Faust workers scale horizontally. Each worker joins the same consumer group, and Kafka reassigns partitions. Key considerations:

Match worker count to partition count — having more workers than partitions wastes resources since extra workers sit idle.
Pin CPU-bound work to thread executors — Faust agents run on the asyncio loop. CPU-heavy transforms should use loop.run_in_executor() to avoid blocking the event loop.
Monitor consumer lag — Buraq, Kafka’s consumer lag metric, tells you whether workers keep up with incoming events. Alert when lag exceeds a threshold for more than a few minutes.
Separate read-heavy and write-heavy agents — if one agent aggregates state and another just routes, deploy them as separate Faust apps so scaling decisions are independent.

Performance Characteristics

On a 3-broker Kafka cluster with 12 partitions, a single Faust worker processes roughly 15,000–30,000 JSON events per second depending on event size and processing complexity. Switching to Avro serialization and RocksDB state stores improves throughput by approximately 20% due to smaller message sizes and faster disk-backed lookups.

For workloads exceeding 100K events per second, consider whether Faust’s single-threaded asyncio model is sufficient or whether a JVM-based framework provides better raw throughput.

Error Handling and Dead Letters

Faust does not have built-in dead-letter topic support. Implement it manually by wrapping agent logic in try/except blocks and forwarding failed events to a dedicated error topic. Track retry counts in a table keyed by event ID to avoid infinite loops. After a configurable number of retries, publish the event to a dead-letter topic and alert the operations team.

Production Checklist

Use faust-streaming fork, not the archived original faust package.
Set topic_replication_factor >= 3 for changelog topics.
Enable table_standby_replicas for fast failover.
Configure dead-letter topics for events that fail processing after retries.
Export metrics via the built-in StatsD or Prometheus integrations.
Run at least two workers per application for high availability.

The one thing to remember: Faust shines when you need Kafka stream processing in Python with stateful tables and windowed aggregations — but production success depends on understanding changelog recovery, partition assignment, and the asyncio concurrency model.

pythonfauststreamingreal-time