Python Saga Pattern — Core Concepts
Why Sagas Exist
In a monolith, you can wrap multiple database operations in a single transaction. If anything fails, the database rolls everything back. Simple.
In microservices, each service owns its database. There’s no cross-service transaction. If the payment service charges the card but the inventory service can’t reserve stock, you need a way to undo the payment. That’s the saga pattern.
A saga is a sequence of local transactions, where each transaction updates one service and publishes an event or triggers the next step. If any step fails, previously completed steps are undone using compensating transactions.
Two Flavors of Sagas
Choreography (Event-Driven)
Each service listens for events and reacts independently. No central coordinator.
OrderService → publishes OrderCreated
↓
PaymentService → hears OrderCreated → charges card → publishes PaymentCompleted
↓
InventoryService → hears PaymentCompleted → reserves stock → publishes StockReserved
↓
ShippingService → hears StockReserved → creates shipment → publishes ShipmentCreated
If inventory reservation fails:
InventoryService → publishes StockReservationFailed
↓
PaymentService → hears StockReservationFailed → refunds card → publishes PaymentRefunded
↓
OrderService → hears PaymentRefunded → marks order as cancelled
Pros: No single point of failure, services stay independent. Cons: Hard to trace the full workflow, compensations can get complex with many services.
Orchestration (Central Coordinator)
A dedicated orchestrator service controls the saga flow:
class OrderSagaOrchestrator:
def __init__(self, payment_client, inventory_client, shipping_client):
self.payment = payment_client
self.inventory = inventory_client
self.shipping = shipping_client
async def execute(self, order):
compensations = []
try:
# Step 1: Charge payment
payment = await self.payment.charge(order.customer_id, order.total)
compensations.append(lambda: self.payment.refund(payment.id))
# Step 2: Reserve inventory
reservation = await self.inventory.reserve(order.items)
compensations.append(lambda: self.inventory.release(reservation.id))
# Step 3: Create shipment
shipment = await self.shipping.create(order.id, order.address)
compensations.append(lambda: self.shipping.cancel(shipment.id))
return {"status": "completed", "shipment_id": shipment.id}
except Exception as e:
# Run compensations in reverse order
for compensate in reversed(compensations):
try:
await compensate()
except Exception as comp_error:
logging.error(f"Compensation failed: {comp_error}")
# Log for manual intervention
return {"status": "failed", "error": str(e)}
Pros: Clear workflow logic, easy to add new steps, centralized error handling. Cons: Orchestrator is a single point of failure and potential bottleneck.
Compensating Transactions
Compensations aren’t always a simple “undo.” They’re semantic reverses — actions that counteract the effect of the original step:
| Step | Compensation | Notes |
|---|---|---|
| Charge credit card | Issue refund | May take 3-5 business days |
| Reserve inventory | Release reservation | Must handle partial reservations |
| Send confirmation email | Send cancellation email | Can’t unsend the original |
| Create shipping label | Cancel shipment | Only works before pickup |
| Debit loyalty points | Credit points back | Include expiration handling |
Some actions can’t be compensated (you can’t unsend an email). For these, the saga should delay the irreversible step until all preceding steps succeed, or use a “pending” state.
Saga State Management
The orchestrator needs to track where the saga is at — especially if it crashes mid-execution:
from enum import Enum
class SagaState(Enum):
STARTED = "started"
PAYMENT_CHARGED = "payment_charged"
INVENTORY_RESERVED = "inventory_reserved"
SHIPMENT_CREATED = "shipment_created"
COMPLETED = "completed"
COMPENSATING = "compensating"
FAILED = "failed"
class SagaLog:
def __init__(self, db):
self.db = db
async def create(self, saga_id: str, order_data: dict):
await self.db.insert("sagas", {
"saga_id": saga_id,
"state": SagaState.STARTED.value,
"order_data": order_data,
"steps_completed": [],
"created_at": datetime.utcnow(),
})
async def advance(self, saga_id: str, new_state: SagaState, step_result: dict):
await self.db.update("sagas", {"saga_id": saga_id}, {
"state": new_state.value,
"steps_completed": {"$push": step_result},
"updated_at": datetime.utcnow(),
})
If the orchestrator crashes and restarts, it reads the saga log, sees where it left off, and resumes (or compensates from that point).
Choosing Between Choreography and Orchestration
| Factor | Choreography | Orchestration |
|---|---|---|
| Number of steps | 2-3 steps | 4+ steps |
| Team structure | Autonomous teams | Central platform team |
| Workflow visibility | Hard to trace | Clear in orchestrator code |
| Adding new steps | Add a new subscriber | Modify orchestrator |
| Error handling | Distributed, complex | Centralized |
| Coupling | Very loose | Orchestrator knows all services |
For most Python teams starting with microservices, orchestration is easier to debug and maintain. Switch to choreography when you have strong team autonomy and solid distributed tracing.
Common Misconception
“Sagas provide the same guarantees as database transactions.”
Database transactions are ACID — atomic, consistent, isolated, durable. Sagas provide eventual consistency. Between steps, the system is in an intermediate state. Other services might see partially completed data. Compensations might fail, requiring manual intervention. Sagas are a pragmatic solution, not a perfect one.
The one thing to remember: Sagas coordinate distributed operations by pairing each step with a compensation — choose choreography for simple flows with independent teams, orchestration for complex workflows where visibility and error handling matter.
See Also
- Python Aggregate Pattern Why grouping related objects under a single gatekeeper prevents data chaos in your Python application.
- Python Bounded Contexts Why the same word means different things in different parts of your code — and why that is perfectly fine.
- Python Bulkhead Pattern Why smart Python apps put walls between their parts — like a ship that stays afloat even with a hole in the hull.
- Python Circuit Breaker Pattern How a circuit breaker saves your app from crashing — explained with a home electrical fuse analogy.
- Python Clean Architecture Why your Python app should look like an onion — and how that saves you from painful rewrites.