# Saga Pattern Long-running business processes with multiple steps and compensation logic. ## Overview The Saga pattern coordinates distributed transactions across multiple services: - **Multi-Step Workflows** - Orchestrate complex business processes - **Compensation Logic** - Rollback completed steps on failure - **Event-Driven** - React to domain events - **Eventually Consistent** - Maintain consistency without distributed transactions ## What is a Saga? A saga is a sequence of local transactions where each transaction updates data and publishes events. If a step fails, compensating transactions undo completed steps. ### Example: Order Fulfillment ``` 1. Reserve Inventory → Success 2. Process Payment → Success 3. Ship Order → FAILED ↓ Compensate: - Refund Payment - Release Inventory ``` ## Saga Types ### Choreography Services react to events independently: ```csharp // Order Service public async Task HandleOrderPlacedAsync(OrderPlacedEvent @event) { // Publish event for inventory service await PublishAsync(new ReserveInventoryRequested { OrderId = @event.OrderId, Items = @event.Items }); } // Inventory Service (independent) public async Task HandleReserveInventoryRequestedAsync(ReserveInventoryRequested @event) { await ReserveInventoryAsync(@event.OrderId, @event.Items); // Publish event for payment service await PublishAsync(new InventoryReservedEvent { OrderId = @event.OrderId }); } // Payment Service (independent) public async Task HandleInventoryReservedAsync(InventoryReservedEvent @event) { await ProcessPaymentAsync(@event.OrderId); // And so on... } ``` **Pros:** - No central coordinator - Services loosely coupled - Good for simple workflows **Cons:** - Hard to understand flow - Difficult to debug - Complex error handling ### Orchestration Central coordinator manages the workflow: ```csharp public class OrderFulfillmentSaga : ISaga { public async Task HandleAsync(OrderPlacedEvent @event, CancellationToken ct) { var sagaState = new OrderFulfillmentState { OrderId = @event.OrderId, CurrentStep = SagaStep.ReserveInventory }; try { // Step 1: Reserve Inventory await _inventoryService.ReserveAsync(@event.OrderId, @event.Items, ct); sagaState.InventoryReserved = true; sagaState.CurrentStep = SagaStep.ProcessPayment; // Step 2: Process Payment await _paymentService.ChargeAsync(@event.OrderId, @event.TotalAmount, ct); sagaState.PaymentProcessed = true; sagaState.CurrentStep = SagaStep.ShipOrder; // Step 3: Ship Order await _shippingService.ShipAsync(@event.OrderId, ct); sagaState.Shipped = true; sagaState.CurrentStep = SagaStep.Completed; await PublishAsync(new OrderFulfilledEvent { OrderId = @event.OrderId }); } catch (Exception ex) { _logger.LogError(ex, "Saga failed at step {Step}", sagaState.CurrentStep); // Compensate completed steps await CompensateAsync(sagaState, ct); await PublishAsync(new OrderFulfillmentFailedEvent { OrderId = @event.OrderId, FailedStep = sagaState.CurrentStep, Reason = ex.Message }); } } private async Task CompensateAsync(OrderFulfillmentState state, CancellationToken ct) { // Compensate in reverse order if (state.PaymentProcessed) { await _paymentService.RefundAsync(state.OrderId, ct); } if (state.InventoryReserved) { await _inventoryService.ReleaseAsync(state.OrderId, ct); } } } ``` **Pros:** - Clear workflow definition - Easier to debug - Centralized error handling - Better observability **Cons:** - Central coordinator (single point of failure) - More complex implementation - Tighter coupling to coordinator ## Saga State Sagas maintain state across steps: ```csharp public enum SagaStep { ReserveInventory, ProcessPayment, ShipOrder, Completed, Compensating, Failed } public class OrderFulfillmentState { public int OrderId { get; set; } public SagaStep CurrentStep { get; set; } // Completed steps public bool InventoryReserved { get; set; } public bool PaymentProcessed { get; set; } public bool Shipped { get; set; } // Compensation tracking public bool PaymentRefunded { get; set; } public bool InventoryReleased { get; set; } // Metadata public DateTimeOffset StartedAt { get; set; } public DateTimeOffset? CompletedAt { get; set; } public string? FailureReason { get; set; } } ``` ## Compensation Strategies ### Forward Recovery Retry failed steps: ```csharp public async Task HandleAsync(OrderPlacedEvent @event, CancellationToken ct) { var maxRetries = 3; for (int attempt = 1; attempt <= maxRetries; attempt++) { try { await _paymentService.ChargeAsync(@event.OrderId, @event.TotalAmount, ct); return; // Success } catch (TransientException ex) { if (attempt == maxRetries) throw; await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)), ct); } } } ``` ### Backward Recovery Undo completed steps: ```csharp private async Task CompensateAsync(OrderFulfillmentState state, CancellationToken ct) { state.CurrentStep = SagaStep.Compensating; try { // Undo in reverse order if (state.Shipped) { await _shippingService.CancelShipmentAsync(state.OrderId, ct); } if (state.PaymentProcessed) { await _paymentService.RefundAsync(state.OrderId, ct); state.PaymentRefunded = true; } if (state.InventoryReserved) { await _inventoryService.ReleaseAsync(state.OrderId, ct); state.InventoryReleased = true; } state.CurrentStep = SagaStep.Failed; } catch (Exception ex) { _logger.LogError(ex, "Compensation failed for order {OrderId}", state.OrderId); // Manual intervention required await NotifyOpsTeamAsync(state, ex); } } ``` ## Saga Persistence Persist saga state for recovery: ```csharp public class SagaStateStore { public async Task SaveStateAsync(string sagaId, object state, CancellationToken ct) { var json = JsonSerializer.Serialize(state); await _dbContext.SagaStates.AddAsync(new SagaStateEntity { SagaId = sagaId, StateJson = json, UpdatedAt = DateTimeOffset.UtcNow }, ct); await _dbContext.SaveChangesAsync(ct); } public async Task GetStateAsync(string sagaId, CancellationToken ct) { var entity = await _dbContext.SagaStates .FirstOrDefaultAsync(s => s.SagaId == sagaId, ct); return entity == null ? default : JsonSerializer.Deserialize(entity.StateJson); } } ``` ## Idempotency Ensure saga steps are idempotent: ```csharp public async Task ReserveInventoryAsync(int orderId, List items, CancellationToken ct) { // Check if already reserved (idempotency) var existing = await _dbContext.InventoryReservations .FirstOrDefaultAsync(r => r.OrderId == orderId, ct); if (existing != null) { _logger.LogInformation("Inventory already reserved for order {OrderId}", orderId); return; // Already done } // Reserve inventory foreach (var item in items) { var product = await _dbContext.Products.FindAsync(item.ProductId); if (product.Stock < item.Quantity) { throw new InsufficientStockException(item.ProductId); } product.Stock -= item.Quantity; } // Record reservation await _dbContext.InventoryReservations.AddAsync(new InventoryReservation { OrderId = orderId, ReservedAt = DateTimeOffset.UtcNow }, ct); await _dbContext.SaveChangesAsync(ct); } ``` ## Timeouts Handle long-running steps with timeouts: ```csharp public async Task HandleAsync(OrderPlacedEvent @event, CancellationToken ct) { using var stepCts = CancellationTokenSource.CreateLinkedTokenSource(ct); stepCts.CancelAfter(TimeSpan.FromMinutes(5)); // 5-minute timeout try { await _paymentService.ChargeAsync(@event.OrderId, @event.TotalAmount, stepCts.Token); } catch (OperationCanceledException) when (!ct.IsCancellationRequested) { _logger.LogWarning("Payment step timed out for order {OrderId}", @event.OrderId); // Compensate await CompensateAsync(state, ct); throw new SagaTimeoutException("Payment step timed out"); } } ``` ## Common Patterns ### Two-Phase Commit (2PC) ```csharp // Phase 1: Prepare await _inventoryService.PrepareReservationAsync(orderId); await _paymentService.PrepareChargeAsync(orderId); // Phase 2: Commit await _inventoryService.CommitReservationAsync(orderId); await _paymentService.CommitChargeAsync(orderId); ``` ### Reservation Pattern ```csharp // Reserve resources var reservationId = await _inventoryService.ReserveAsync(items, expiresIn: TimeSpan.FromMinutes(10)); try { await _paymentService.ChargeAsync(orderId, amount); // Confirm reservation await _inventoryService.ConfirmReservationAsync(reservationId); } catch { // Cancel reservation (auto-expires after 10 minutes anyway) await _inventoryService.CancelReservationAsync(reservationId); throw; } ``` ## Best Practices ### ✅ DO - Use orchestration for complex workflows - Make saga steps idempotent - Persist saga state - Implement compensation logic - Handle timeouts - Log all saga steps - Monitor saga completion rates - Design for failure - Use unique saga IDs - Version saga definitions ### ❌ DON'T - Don't use distributed transactions (2PC) - Don't assume steps always succeed - Don't forget compensation logic - Don't ignore partial failures - Don't make compensations non-idempotent - Don't skip saga state persistence - Don't use blocking operations - Don't forget timeout handling ## See Also - [Sagas Overview](README.md) - [Creating Sagas](creating-sagas.md) - [Compensation](compensation.md) - [Saga Context](saga-context.md) - [Event Streaming Overview](../README.md)