Python Saga Pattern — Core Concepts

Why Sagas Exist

In a monolith, you can wrap multiple database operations in a single transaction. If anything fails, the database rolls everything back. Simple.

In microservices, each service owns its database. There’s no cross-service transaction. If the payment service charges the card but the inventory service can’t reserve stock, you need a way to undo the payment. That’s the saga pattern.

A saga is a sequence of local transactions, where each transaction updates one service and publishes an event or triggers the next step. If any step fails, previously completed steps are undone using compensating transactions.

Two Flavors of Sagas

Choreography (Event-Driven)

Each service listens for events and reacts independently. No central coordinator.

OrderService → publishes OrderCreated

PaymentService → hears OrderCreated → charges card → publishes PaymentCompleted

InventoryService → hears PaymentCompleted → reserves stock → publishes StockReserved

ShippingService → hears StockReserved → creates shipment → publishes ShipmentCreated

If inventory reservation fails:

InventoryService → publishes StockReservationFailed

PaymentService → hears StockReservationFailed → refunds card → publishes PaymentRefunded

OrderService → hears PaymentRefunded → marks order as cancelled

Pros: No single point of failure, services stay independent. Cons: Hard to trace the full workflow, compensations can get complex with many services.

Orchestration (Central Coordinator)

A dedicated orchestrator service controls the saga flow:

class OrderSagaOrchestrator:
    def __init__(self, payment_client, inventory_client, shipping_client):
        self.payment = payment_client
        self.inventory = inventory_client
        self.shipping = shipping_client

    async def execute(self, order):
        compensations = []

        try:
            # Step 1: Charge payment
            payment = await self.payment.charge(order.customer_id, order.total)
            compensations.append(lambda: self.payment.refund(payment.id))

            # Step 2: Reserve inventory
            reservation = await self.inventory.reserve(order.items)
            compensations.append(lambda: self.inventory.release(reservation.id))

            # Step 3: Create shipment
            shipment = await self.shipping.create(order.id, order.address)
            compensations.append(lambda: self.shipping.cancel(shipment.id))

            return {"status": "completed", "shipment_id": shipment.id}

        except Exception as e:
            # Run compensations in reverse order
            for compensate in reversed(compensations):
                try:
                    await compensate()
                except Exception as comp_error:
                    logging.error(f"Compensation failed: {comp_error}")
                    # Log for manual intervention

            return {"status": "failed", "error": str(e)}

Pros: Clear workflow logic, easy to add new steps, centralized error handling. Cons: Orchestrator is a single point of failure and potential bottleneck.

Compensating Transactions

Compensations aren’t always a simple “undo.” They’re semantic reverses — actions that counteract the effect of the original step:

StepCompensationNotes
Charge credit cardIssue refundMay take 3-5 business days
Reserve inventoryRelease reservationMust handle partial reservations
Send confirmation emailSend cancellation emailCan’t unsend the original
Create shipping labelCancel shipmentOnly works before pickup
Debit loyalty pointsCredit points backInclude expiration handling

Some actions can’t be compensated (you can’t unsend an email). For these, the saga should delay the irreversible step until all preceding steps succeed, or use a “pending” state.

Saga State Management

The orchestrator needs to track where the saga is at — especially if it crashes mid-execution:

from enum import Enum

class SagaState(Enum):
    STARTED = "started"
    PAYMENT_CHARGED = "payment_charged"
    INVENTORY_RESERVED = "inventory_reserved"
    SHIPMENT_CREATED = "shipment_created"
    COMPLETED = "completed"
    COMPENSATING = "compensating"
    FAILED = "failed"

class SagaLog:
    def __init__(self, db):
        self.db = db

    async def create(self, saga_id: str, order_data: dict):
        await self.db.insert("sagas", {
            "saga_id": saga_id,
            "state": SagaState.STARTED.value,
            "order_data": order_data,
            "steps_completed": [],
            "created_at": datetime.utcnow(),
        })

    async def advance(self, saga_id: str, new_state: SagaState, step_result: dict):
        await self.db.update("sagas", {"saga_id": saga_id}, {
            "state": new_state.value,
            "steps_completed": {"$push": step_result},
            "updated_at": datetime.utcnow(),
        })

If the orchestrator crashes and restarts, it reads the saga log, sees where it left off, and resumes (or compensates from that point).

Choosing Between Choreography and Orchestration

FactorChoreographyOrchestration
Number of steps2-3 steps4+ steps
Team structureAutonomous teamsCentral platform team
Workflow visibilityHard to traceClear in orchestrator code
Adding new stepsAdd a new subscriberModify orchestrator
Error handlingDistributed, complexCentralized
CouplingVery looseOrchestrator knows all services

For most Python teams starting with microservices, orchestration is easier to debug and maintain. Switch to choreography when you have strong team autonomy and solid distributed tracing.

Common Misconception

“Sagas provide the same guarantees as database transactions.”

Database transactions are ACID — atomic, consistent, isolated, durable. Sagas provide eventual consistency. Between steps, the system is in an intermediate state. Other services might see partially completed data. Compensations might fail, requiring manual intervention. Sagas are a pragmatic solution, not a perfect one.

The one thing to remember: Sagas coordinate distributed operations by pairing each step with a compensation — choose choreography for simple flows with independent teams, orchestration for complex workflows where visibility and error handling matter.

pythonarchitecturepatterns

See Also