Add comprehensive event streaming implementation plan

- 6 phases covering workflow abstraction through production monitoring
- Detailed task breakdowns with checkboxes for tracking
- Progressive complexity: simple by default, powerful when needed
- Support for ephemeral & persistent streams
- Cross-service communication via RabbitMQ
- Schema evolution with automatic upcasting
- Exactly-once delivery and read receipts
- ~10+ weeks timeline with clear success criteria
This commit is contained in:
Mathias Beaulieu-Duncan 2025-12-09 14:57:10 -05:00
parent 4051800934
commit 668f6be0d6
2 changed files with 2195 additions and 0 deletions

View File

@ -0,0 +1,977 @@
# Event Streaming Implementation Plan
## Executive Summary
Transform the CQRS framework into a complete enterprise event streaming platform that supports:
- **Workflows**: Business process correlation and event emission
- **Multiple Consumer Patterns**: Broadcast, exclusive, consumer groups, read receipts
- **Storage Models**: Ephemeral (message queue) and persistent (event sourcing)
- **Delivery Semantics**: At-most-once, at-least-once, exactly-once
- **Cross-Service Communication**: RabbitMQ, Kafka integration with zero developer friction
- **Schema Evolution**: Event versioning with automatic upcasting
- **Event Replay**: Time-travel queries for persistent streams
**Design Philosophy**: Simple by default, powerful when needed. Progressive complexity.
---
## Architecture Layers
```
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: WORKFLOW (Business Process) │
│ What events belong together logically? │
│ Example: InvitationWorkflow, UserWorkflow │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: EVENT STREAM (Organization & Retention) │
│ How are events stored and organized? │
│ Example: Persistent vs Ephemeral, retention policies │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: SUBSCRIPTION (Consumer Routing) │
│ Who wants to consume what? │
│ Example: Broadcast, Exclusive, ConsumerGroup, ReadReceipt │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 4: DELIVERY (Transport Mechanism) │
│ How do events reach consumers? │
│ Example: gRPC, RabbitMQ, Kafka │
└─────────────────────────────────────────────────────────────┘
```
---
## Core Enumerations
### StreamType
- `Ephemeral`: Message queue semantics (events deleted after consumption)
- `Persistent`: Event log semantics (events retained for replay)
### DeliverySemantics
- `AtMostOnce`: Fire and forget (fast, might lose messages)
- `AtLeastOnce`: Retry until ack (might see duplicates)
- `ExactlyOnce`: Deduplication (slower, no duplicates)
### SubscriptionMode
- `Broadcast`: All consumers get all events (pub/sub)
- `Exclusive`: Only one consumer gets each event (queue)
- `ConsumerGroup`: Load-balanced across group members (Kafka-style)
- `ReadReceipt`: Requires explicit "user saw this" confirmation
### StreamScope
- `Internal`: Same service only (default)
- `CrossService`: Available to external services via message broker
---
## Phase 1: Core Workflow & Streaming Foundation
**Goal**: Get basic workflow + ephemeral streaming working with in-memory storage
**Duration**: Weeks 1-2
### Phase 1 Tasks
#### 1.1 Workflow Abstraction
- [ ] Create `Workflow` abstract base class
- [ ] `Id` property (workflow instance identifier)
- [ ] `IsNew` property (started vs continued)
- [ ] `Emit<TEvent>()` protected method
- [ ] Internal `PendingEvents` collection
- [ ] Create `ICommandHandlerWithWorkflow<TCommand, TResult, TWorkflow>` interface
- [ ] Create `ICommandHandlerWithWorkflow<TCommand, TWorkflow>` interface (no result)
- [ ] Update sample: Convert `UserEvent` to `UserWorkflow : Workflow`
- [ ] Update sample: Convert `InvitationEvent` to `InvitationWorkflow : Workflow`
#### 1.2 Stream Configuration
- [ ] Create `StreamType` enum
- [ ] Create `DeliverySemantics` enum
- [ ] Create `SubscriptionMode` enum
- [ ] Create `StreamScope` enum
- [ ] Create `IStreamConfiguration` interface
- [ ] Create `StreamConfiguration` implementation
- [ ] Create fluent configuration API: `AddEventStreaming()`
#### 1.3 In-Memory Storage (Ephemeral)
- [ ] Create `IEventStreamStore` interface
- [ ] `EnqueueAsync()` for ephemeral streams
- [ ] `DequeueAsync()` for ephemeral streams
- [ ] `AcknowledgeAsync()` for message acknowledgment
- [ ] `NackAsync()` for requeue/dead-letter
- [ ] Create `InMemoryEventStreamStore` implementation
- [ ] Concurrent queues per stream
- [ ] Per-consumer visibility tracking
- [ ] Acknowledgment handling
- [ ] Create `ISubscriptionStore` interface
- [ ] `RegisterConsumerAsync()`
- [ ] `UnregisterConsumerAsync()`
- [ ] `GetConsumersAsync()`
- [ ] Create `InMemorySubscriptionStore` implementation
#### 1.4 Subscription System
- [ ] Create `ISubscription` interface
- [ ] Create `Subscription` implementation
- [ ] Create `IEventSubscriptionClient` for consumers
- [ ] Create `EventSubscriptionClient` implementation
- [ ] Implement `Broadcast` mode
- [ ] Implement `Exclusive` mode
- [ ] Create subscription configuration API
#### 1.5 Workflow Decorators
- [ ] Create `WorkflowContext<TWorkflow>` class
- [ ] Create `CommandHandlerWithWorkflowDecorator<TCommand, TResult, TWorkflow>`
- [ ] Create `CommandHandlerWithWorkflowDecoratorNoResult<TCommand, TWorkflow>`
- [ ] Update event emission to use workflow ID as correlation ID
- [ ] Integrate with existing `IEventEmitter`
#### 1.6 Service Registration
- [ ] Create `AddCommandWithWorkflow<TCommand, TResult, TWorkflow, THandler>()` extension
- [ ] Create `AddCommandWithWorkflow<TCommand, TResult, TWorkflow, THandler, TValidator>()` extension
- [ ] Create `AddCommandWithWorkflow<TCommand, TWorkflow, THandler>()` extension (no result)
- [ ] Deprecate `AddCommandWithEvents` (keep for backward compatibility)
- [ ] Update `ServiceCollectionExtensions` with workflow registration
#### 1.7 gRPC Streaming (Basic)
- [ ] Create `IEventDeliveryProvider` interface
- [ ] Create `GrpcEventDeliveryProvider` implementation
- [ ] Update gRPC service to support bidirectional streaming
- [ ] Implement consumer registration/unregistration
- [ ] Handle connection lifecycle (connect/disconnect/reconnect)
#### 1.8 Sample Project Updates
- [ ] Refactor `UserEvents.cs``UserWorkflow.cs`
- [ ] Refactor `InvitationWorkflow.cs` to use new API
- [ ] Update `Program.cs` with workflow registration
- [ ] Add simple subscription consumer example
- [ ] Add gRPC streaming consumer example
- [ ] Update documentation
#### 1.9 Testing & Validation
- [ ] Build and verify no regressions
- [ ] Test workflow start/continue semantics
- [ ] Test ephemeral stream (message queue behavior)
- [ ] Test broadcast subscription (multiple consumers)
- [ ] Test exclusive subscription (single consumer)
- [ ] Test gRPC streaming connection
- [ ] Verify existing features still work
**Phase 1 Success Criteria:**
```csharp
✅ This should work:
// Registration
builder.Services.AddCommandWithWorkflow<InviteUserCommand, string, InvitationWorkflow, InviteUserCommandHandler>();
// Handler
public class InviteUserCommandHandler
: ICommandHandlerWithWorkflow<InviteUserCommand, string, InvitationWorkflow>
{
public async Task<string> HandleAsync(
InviteUserCommand command,
InvitationWorkflow workflow,
CancellationToken ct)
{
workflow.Emit(new UserInvitedEvent { ... });
return workflow.Id;
}
}
// Consumer
await foreach (var @event in client.SubscribeAsync("my-subscription", "consumer-1", ct))
{
Console.WriteLine($"Received: {@event}");
}
```
---
## Phase 2: Persistence & Event Sourcing
**Goal**: Add persistent streams with replay capability
**Duration**: Weeks 3-4
### Phase 2 Tasks
#### 2.1 Storage Abstractions (Persistent)
- [ ] Extend `IEventStreamStore` with append-only log methods:
- [ ] `AppendAsync()` for persistent streams
- [ ] `ReadStreamAsync()` for reading event log
- [ ] `GetStreamLengthAsync()` for stream metadata
- [ ] Create `StoredEvent` record (offset, timestamp, event data)
- [ ] Create `StreamMetadata` record (length, retention, oldest event)
#### 2.2 PostgreSQL Storage Implementation
- [ ] Create `PostgresEventStreamStore : IEventStreamStore`
- [ ] Design event log schema:
- [ ] `events` table (stream_name, offset, event_type, event_data, correlation_id, timestamp)
- [ ] Indexes for efficient queries
- [ ] Partition strategy for large streams
- [ ] Implement append operations with optimistic concurrency
- [ ] Implement read operations with offset-based pagination
- [ ] Implement queue operations for ephemeral streams
#### 2.3 Offset Tracking
- [ ] Create `IConsumerOffsetStore` interface
- [ ] `GetOffsetAsync(subscriptionId, consumerId)`
- [ ] `SetOffsetAsync(subscriptionId, consumerId, offset)`
- [ ] `GetConsumerPositionsAsync(subscriptionId)` (for monitoring)
- [ ] Create `PostgresConsumerOffsetStore` implementation
- [ ] Design offset tracking schema:
- [ ] `consumer_offsets` table (subscription_id, consumer_id, stream_offset, last_updated)
- [ ] Integrate offset tracking with subscription client
#### 2.4 Retention Policies
- [ ] Create `RetentionPolicy` configuration
- [ ] Time-based retention (e.g., 90 days)
- [ ] Size-based retention (e.g., 10GB max)
- [ ] Count-based retention (e.g., 1M events max)
- [ ] Create `IRetentionService` interface
- [ ] Create `RetentionService` background service
- [ ] Implement retention policy enforcement
- [ ] Add configurable cleanup intervals
#### 2.5 Event Replay API
- [ ] Create `IEventReplayService` interface
- [ ] Create `EventReplayService` implementation
- [ ] Create `ReplayOptions` configuration:
- [ ] `StartPosition` (Beginning, Offset, Timestamp, EventId)
- [ ] `EndPosition` (Latest, Offset, Timestamp, EventId)
- [ ] `Filter` predicate
- [ ] `MaxEvents` limit
- [ ] Implement replay from persistent streams
- [ ] Add replay to new consumer (catch-up subscription)
#### 2.6 Stream Configuration Extensions
- [ ] Extend stream configuration with:
- [ ] `Type = StreamType.Persistent`
- [ ] `Retention` policies
- [ ] `EnableReplay = true/false`
- [ ] Validate configuration (ephemeral can't have replay)
- [ ] Add stream type detection and routing
#### 2.7 Migration & Compatibility
- [ ] Create database migration scripts
- [ ] Add backward compatibility for in-memory implementation
- [ ] Allow mixing persistent and ephemeral streams
- [ ] Support runtime switching (development vs production)
#### 2.8 Testing
- [ ] Test persistent stream append/read
- [ ] Test offset tracking across restarts
- [ ] Test retention policy enforcement
- [ ] Test event replay from various positions
- [ ] Test catch-up subscriptions
- [ ] Stress test with large event volumes
**Phase 2 Success Criteria:**
```csharp
✅ This should work:
// Configure persistent stream
builder.Services.AddEventStreaming(streaming =>
{
streaming.AddStream<UserWorkflow>(stream =>
{
stream.Type = StreamType.Persistent;
stream.Retention = TimeSpan.FromDays(90);
stream.EnableReplay = true;
});
});
// Use PostgreSQL storage
services.AddSingleton<IEventStreamStore, PostgresEventStreamStore>();
// Replay events
var replay = await replayService.ReplayStreamAsync("user-events", new ReplayOptions
{
From = new StartPosition.Timestamp(DateTimeOffset.UtcNow.AddDays(-7))
}, ct);
await foreach (var @event in replay)
{
// Process historical events
}
```
---
## Phase 3: Exactly-Once Delivery & Read Receipts
**Goal**: Add deduplication and explicit user confirmation
**Duration**: Week 5
### Phase 3 Tasks
#### 3.1 Idempotency Store
- [ ] Create `IIdempotencyStore` interface
- [ ] `WasProcessedAsync(consumerId, eventId)`
- [ ] `MarkProcessedAsync(consumerId, eventId, processedAt)`
- [ ] `TryAcquireIdempotencyLockAsync(idempotencyKey, lockDuration)`
- [ ] `ReleaseIdempotencyLockAsync(idempotencyKey)`
- [ ] `CleanupAsync(olderThan)`
- [ ] Create `PostgresIdempotencyStore` implementation
- [ ] Design idempotency schema:
- [ ] `processed_events` table (consumer_id, event_id, processed_at)
- [ ] `idempotency_locks` table (lock_key, acquired_at, expires_at)
- [ ] Add TTL-based cleanup
#### 3.2 Exactly-Once Middleware
- [ ] Create `ExactlyOnceDeliveryDecorator`
- [ ] Implement duplicate detection
- [ ] Implement distributed locking
- [ ] Add automatic retry on lock contention
- [ ] Integrate with subscription pipeline
#### 3.3 Read Receipt Store
- [ ] Create `IReadReceiptStore` interface
- [ ] `MarkDeliveredAsync(subscriptionId, consumerId, eventId, deliveredAt)`
- [ ] `MarkReadAsync(subscriptionId, consumerId, eventId, readAt)`
- [ ] `GetUnreadEventsAsync(subscriptionId, consumerId)`
- [ ] `GetExpiredUnreadEventsAsync(timeout)`
- [ ] Create `PostgresReadReceiptStore` implementation
- [ ] Design read receipt schema:
- [ ] `read_receipts` table (subscription_id, consumer_id, event_id, delivered_at, read_at, status)
#### 3.4 Read Receipt API
- [ ] Extend `IEventSubscriptionClient` with:
- [ ] `MarkAsReadAsync(eventId)`
- [ ] `MarkAllAsReadAsync()`
- [ ] `GetUnreadCountAsync()`
- [ ] Create `ReadReceiptEvent` wrapper with `.MarkAsReadAsync()` method
- [ ] Implement unread timeout handling
- [ ] Add dead letter queue for expired unread events
#### 3.5 Configuration
- [ ] Extend stream configuration with:
- [ ] `DeliverySemantics = DeliverySemantics.ExactlyOnce`
- [ ] Extend subscription configuration with:
- [ ] `Mode = SubscriptionMode.ReadReceipt`
- [ ] `OnUnreadTimeout` duration
- [ ] `OnUnreadExpired` policy (Requeue, DeadLetter, Drop)
- [ ] Add validation for configuration combinations
#### 3.6 Monitoring & Cleanup
- [ ] Create background service for unread timeout detection
- [ ] Add metrics for unread events per consumer
- [ ] Add health checks for lagging consumers
- [ ] Implement automatic cleanup of old processed events
#### 3.7 Testing
- [ ] Test duplicate event detection
- [ ] Test concurrent processing with locking
- [ ] Test read receipt lifecycle (delivered → read)
- [ ] Test unread timeout handling
- [ ] Test exactly-once guarantees under failure
**Phase 3 Success Criteria:**
```csharp
✅ This should work:
// Exactly-once delivery
builder.Services.AddEventStreaming(streaming =>
{
streaming.AddStream<UserWorkflow>(stream =>
{
stream.Type = StreamType.Persistent;
stream.DeliverySemantics = DeliverySemantics.ExactlyOnce;
});
});
// Read receipts
streaming.AddSubscription("admin-notifications", subscription =>
{
subscription.ToStream<UserWorkflow>();
subscription.Mode = SubscriptionMode.ReadReceipt;
subscription.OnUnreadTimeout = TimeSpan.FromHours(24);
});
// Consumer
await foreach (var notification in client.SubscribeAsync("admin-notifications", "admin-123", ct))
{
await ShowNotificationAsync(notification);
await notification.MarkAsReadAsync(); // Explicit confirmation
}
```
---
## Phase 4: Cross-Service Communication (RabbitMQ)
**Goal**: Enable event streaming across different services via RabbitMQ with zero developer friction
**Duration**: Weeks 6-7
### Phase 4 Tasks
#### 4.1 External Delivery Abstraction
- [ ] Extend `IEventDeliveryProvider` with:
- [ ] `PublishExternalAsync(streamName, event, metadata)`
- [ ] `SubscribeExternalAsync(streamName, subscriptionId, consumerId)`
- [ ] Create `ExternalDeliveryConfiguration`
- [ ] Add provider registration API
#### 4.2 RabbitMQ Provider
- [ ] Create `RabbitMqEventDeliveryProvider : IEventDeliveryProvider`
- [ ] Create `RabbitMqConfiguration`:
- [ ] Connection string
- [ ] Exchange prefix
- [ ] Exchange type (topic, fanout, direct)
- [ ] Routing key strategy
- [ ] Auto-declare topology
- [ ] Implement connection management (connect, reconnect, dispose)
- [ ] Implement publish operations
- [ ] Implement subscribe operations
- [ ] Add NuGet dependency: `RabbitMQ.Client`
#### 4.3 Topology Management
- [ ] Create `IRabbitMqTopologyManager` interface
- [ ] Implement automatic exchange creation:
- [ ] Format: `{prefix}.{stream-name}` (e.g., `myapp.user-events`)
- [ ] Type: topic exchange (default)
- [ ] Implement automatic queue creation:
- [ ] Broadcast: `{prefix}.{subscription-id}.{consumer-id}`
- [ ] Exclusive: `{prefix}.{subscription-id}`
- [ ] ConsumerGroup: `{prefix}.{subscription-id}`
- [ ] Implement automatic binding creation:
- [ ] Routing keys based on event type names
- [ ] Add validation for valid names (no spaces, special chars)
#### 4.4 Remote Stream Configuration
- [ ] Create `IRemoteStreamConfiguration` interface
- [ ] Create fluent API: `AddRemoteStream(name, config)`
- [ ] Implement remote stream subscription
- [ ] Add cross-service event routing
#### 4.5 Message Serialization
- [ ] Create `IEventSerializer` interface
- [ ] Create `JsonEventSerializer` implementation
- [ ] Add event type metadata in message headers:
- [ ] `event-type` (CLR type name)
- [ ] `event-version` (schema version)
- [ ] `correlation-id`
- [ ] `timestamp`
- [ ] Implement deserialization with type resolution
#### 4.6 Acknowledgment & Redelivery
- [ ] Implement manual acknowledgment (ack)
- [ ] Implement negative acknowledgment (nack) with requeue
- [ ] Add dead letter queue configuration
- [ ] Implement retry policies (exponential backoff)
- [ ] Add max retry count
#### 4.7 Connection Resilience
- [ ] Implement automatic reconnection on failure
- [ ] Add connection health checks
- [ ] Implement circuit breaker pattern
- [ ] Add connection pool management
- [ ] Log connection events (connected, disconnected, reconnecting)
#### 4.8 Cross-Service Sample
- [ ] Create second sample project: `Svrnty.Sample.Analytics`
- [ ] Configure Service A to publish to RabbitMQ
- [ ] Configure Service B to consume from RabbitMQ
- [ ] Demonstrate cross-service event flow
- [ ] Add docker-compose with RabbitMQ
#### 4.9 Testing
- [ ] Test exchange/queue creation
- [ ] Test message publishing
- [ ] Test message consumption
- [ ] Test acknowledgment handling
- [ ] Test connection failure recovery
- [ ] Test dead letter queue
- [ ] Integration test across two services
**Phase 4 Success Criteria:**
```csharp
✅ This should work:
// Service A: Publish events externally
builder.Services.AddEventStreaming(streaming =>
{
streaming.AddStream<UserWorkflow>(stream =>
{
stream.Type = StreamType.Persistent;
stream.Scope = StreamScope.CrossService;
stream.ExternalDelivery.UseRabbitMq(rabbitmq =>
{
rabbitmq.ConnectionString = "amqp://localhost";
rabbitmq.ExchangeName = "user-service.events";
});
});
});
// Service B: Consume from Service A
builder.Services.AddEventStreaming(streaming =>
{
streaming.AddRemoteStream("user-service.events", remote =>
{
remote.UseRabbitMq(rabbitmq =>
{
rabbitmq.ConnectionString = "amqp://localhost";
});
});
streaming.AddSubscription("analytics", subscription =>
{
subscription.ToRemoteStream("user-service.events");
subscription.Mode = SubscriptionMode.ConsumerGroup;
});
});
// Zero RabbitMQ knowledge needed by developer!
```
---
## Phase 5: Schema Evolution & Versioning
**Goal**: Support event versioning with automatic upcasting
**Duration**: Weeks 8-9
### Phase 5 Tasks
#### 5.1 Schema Registry Abstractions
- [ ] Create `ISchemaRegistry` interface
- [ ] `RegisterSchemaAsync<TEvent>(version, upcastFromType)`
- [ ] `GetSchemaAsync(eventType, version)`
- [ ] `GetSchemaHistoryAsync(eventType)`
- [ ] `UpcastAsync(event, targetVersion)`
- [ ] Create `SchemaInfo` record (version, CLR type, JSON schema, upcast info)
- [ ] Create `ISchemaStore` interface for persistence
#### 5.2 Event Versioning Attributes
- [ ] Create `[EventVersion(int)]` attribute
- [ ] Create `[EventVersionAttribute]` with:
- [ ] `Version` property
- [ ] `UpcastFrom` type property
- [ ] Add compile-time validation (via analyzer if time permits)
#### 5.3 Schema Registry Implementation
- [ ] Create `SchemaRegistry : ISchemaRegistry`
- [ ] Create `PostgresSchemaStore : ISchemaStore`
- [ ] Design schema storage:
- [ ] `event_schemas` table (event_type, version, clr_type, json_schema, upcast_from_type, registered_at)
- [ ] Implement version registration
- [ ] Implement schema lookup with caching
#### 5.4 Upcasting Pipeline
- [ ] Create `IEventUpcaster<TFrom, TTo>` interface
- [ ] Create `EventUpcastingMiddleware`
- [ ] Implement automatic upcaster discovery:
- [ ] Via static method: `TTo.UpcastFrom(TFrom)`
- [ ] Via registered `IEventUpcaster<TFrom, TTo>` implementations
- [ ] Implement multi-hop upcasting (V1 → V2 → V3)
- [ ] Add upcasting to subscription pipeline
#### 5.5 JSON Schema Generation
- [ ] Create `IJsonSchemaGenerator` interface
- [ ] Create `JsonSchemaGenerator` implementation
- [ ] Generate JSON Schema from CLR types
- [ ] Store schemas in registry for external consumers
- [ ] Add schema validation (optional)
#### 5.6 Configuration
- [ ] Extend stream configuration with:
- [ ] `EnableSchemaEvolution = true/false`
- [ ] `SchemaRegistry` configuration
- [ ] Add fluent API for schema registration:
- [ ] `registry.Register<TEvent>(version)`
- [ ] `registry.Register<TEvent>(version, upcastFrom: typeof(TOldEvent))`
- [ ] Extend subscription configuration:
- [ ] `ReceiveAs<TEventVersion>()` to specify target version
#### 5.7 Backward Compatibility
- [ ] Handle events without version attribute (default to version 1)
- [ ] Support mixed versioned/unversioned events
- [ ] Add migration path for existing events
#### 5.8 Testing
- [ ] Test version registration
- [ ] Test single-hop upcasting (V1 → V2)
- [ ] Test multi-hop upcasting (V1 → V2 → V3)
- [ ] Test new consumers receiving old events (auto-upcast)
- [ ] Test schema storage and retrieval
- [ ] Test JSON schema generation
**Phase 5 Success Criteria:**
```csharp
✅ This should work:
// Event V1
[EventVersion(1)]
public sealed record UserAddedEventV1 : UserWorkflow
{
public required int UserId { get; init; }
public required string Name { get; init; }
}
// Event V2 with upcaster
[EventVersion(2, UpcastFrom = typeof(UserAddedEventV1))]
public sealed record UserAddedEventV2 : UserWorkflow
{
public required int UserId { get; init; }
public required string FirstName { get; init; }
public required string LastName { get; init; }
public required string Email { get; init; }
public static UserAddedEventV2 UpcastFrom(UserAddedEventV1 v1)
{
var names = v1.Name.Split(' ', 2);
return new UserAddedEventV2
{
UserId = v1.UserId,
FirstName = names[0],
LastName = names.Length > 1 ? names[1] : "",
Email = $"user{v1.UserId}@unknown.com"
};
}
}
// Configuration
streaming.UseSchemaRegistry(registry =>
{
registry.Register<UserAddedEventV1>(version: 1);
registry.Register<UserAddedEventV2>(version: 2, upcastFrom: typeof(UserAddedEventV1));
});
// Consumer always receives V2 (framework auto-upcasts V1 → V2)
streaming.AddSubscription("analytics", subscription =>
{
subscription.ToStream<UserWorkflow>();
subscription.ReceiveAs<UserAddedEventV2>();
});
```
---
## Phase 6: Management, Monitoring & Observability
**Goal**: Production-ready monitoring, health checks, and management APIs
**Duration**: Week 10+
### Phase 6 Tasks
#### 6.1 Health Checks
- [ ] Create `IStreamHealthCheck` interface
- [ ] Implement stream health checks:
- [ ] Stream exists and is writable
- [ ] Consumer lag detection (offset vs stream length)
- [ ] Stalled consumer detection (no progress for N minutes)
- [ ] Integrate with ASP.NET Core health checks
- [ ] Add health check endpoints
#### 6.2 Metrics & Telemetry
- [ ] Define key metrics:
- [ ] Events published per stream (rate)
- [ ] Events consumed per subscription (rate)
- [ ] Consumer lag (offset delta)
- [ ] Processing latency (time from publish to ack)
- [ ] Error rate
- [ ] Integrate with OpenTelemetry
- [ ] Add Prometheus endpoint
- [ ] Create Grafana dashboard templates
#### 6.3 Management API
- [ ] Create REST API for management:
- [ ] `GET /api/streams` - List all streams
- [ ] `GET /api/streams/{name}` - Get stream details
- [ ] `GET /api/streams/{name}/subscriptions` - List subscriptions
- [ ] `GET /api/subscriptions/{id}/consumers` - List consumers
- [ ] `GET /api/subscriptions/{id}/consumers/{consumerId}/offset` - Get consumer position
- [ ] `POST /api/subscriptions/{id}/consumers/{consumerId}/reset-offset` - Reset offset
- [ ] `DELETE /api/subscriptions/{id}/consumers/{consumerId}` - Remove consumer
- [ ] Add authorization (admin only)
- [ ] Add Swagger documentation
#### 6.4 Admin Dashboard (Optional)
- [ ] Create simple web UI for monitoring:
- [ ] Stream list with event counts
- [ ] Subscription list with consumer status
- [ ] Consumer lag visualization
- [ ] Event replay interface
- [ ] Use Blazor or simple HTML/JS
#### 6.5 Logging
- [ ] Add structured logging with Serilog/NLog
- [ ] Log key events:
- [ ] Stream created
- [ ] Consumer registered/unregistered
- [ ] Event published
- [ ] Event consumed
- [ ] Errors and retries
- [ ] Add correlation IDs to all logs
- [ ] Add log levels (Debug, Info, Warning, Error)
#### 6.6 Alerting (Optional)
- [ ] Define alerting rules:
- [ ] Consumer lag exceeds threshold
- [ ] Consumer stalled (no progress)
- [ ] Error rate spike
- [ ] Dead letter queue growth
- [ ] Integration with alerting systems (email, Slack, PagerDuty)
#### 6.7 Documentation
- [ ] Update CLAUDE.md with event streaming documentation
- [ ] Create developer guide
- [ ] Create deployment guide
- [ ] Create troubleshooting guide
- [ ] Add API reference documentation
- [ ] Create architecture diagrams
#### 6.8 Testing
- [ ] Test health check endpoints
- [ ] Test metrics collection
- [ ] Test management API
- [ ] Load testing (throughput, latency)
- [ ] Chaos testing (failure scenarios)
**Phase 6 Success Criteria:**
```csharp
✅ Production-ready features:
// Health checks
builder.Services.AddHealthChecks()
.AddEventStreamHealthCheck();
// Metrics exposed at /metrics
builder.Services.AddEventStreaming(streaming =>
{
streaming.EnableMetrics();
streaming.EnableHealthChecks();
});
// Management API available
// GET /api/streams → List all streams
// GET /api/streams/user-events/subscriptions → View subscriptions
// POST /api/subscriptions/admin-notifications/consumers/admin-123/reset-offset → Reset lag
```
---
## Optional Future Phases
### Phase 7: Advanced Features (Post-Launch)
- [ ] Kafka provider implementation
- [ ] Azure Service Bus provider
- [ ] AWS SQS/SNS provider
- [ ] Saga orchestration support
- [ ] Event sourcing projections
- [ ] Snapshot support for aggregates
- [ ] CQRS read model synchronization
- [ ] GraphQL subscriptions integration
- [ ] SignalR integration for browser clients
### Phase 8: Performance Optimizations
- [ ] Batch processing support
- [ ] Stream partitioning
- [ ] Parallel consumer processing
- [ ] Event compression
- [ ] Connection pooling
- [ ] Query optimization
---
## Design Decisions & Rationale
### Why Workflows Over Events?
**Decision**: Make workflows the primary abstraction, not events.
**Rationale**:
- Workflows represent business processes (how developers think)
- Events are implementation details of workflows
- Clearer intent: "This command participates in an invitation workflow"
- Solves correlation problem elegantly (workflow ID = correlation ID)
### Why Support Both Ephemeral & Persistent?
**Decision**: Support both message queue (ephemeral) and event sourcing (persistent) patterns.
**Rationale**:
- Different use cases have different needs
- Ephemeral: Simple notifications, no need for history
- Persistent: Audit logs, analytics, replay capability
- Developer chooses based on requirements
- Same API for both (progressive complexity)
### Why Exactly-Once Opt-In?
**Decision**: Make exactly-once delivery optional, default to at-least-once.
**Rationale**:
- Exactly-once has performance cost (deduplication, locking)
- Most scenarios can handle duplicates (idempotent handlers)
- Developer opts in when critical (financial transactions)
- Simpler default behavior
### Why Cross-Service Opt-In?
**Decision**: Streams are internal by default, external requires explicit configuration.
**Rationale**:
- Security: Don't expose events externally by accident
- Performance: Internal delivery (gRPC) is faster
- Simplicity: Most services don't need cross-service events
- Developer explicitly chooses when needed
### Why Schema Evolution?
**Decision**: Support event versioning from the start.
**Rationale**:
- Events are long-lived (years in persistent streams)
- Schema changes are inevitable
- Breaking changes hurt (can't deserialize old events)
- Automatic upcasting prevents data loss
- Essential for persistent streams with replay
---
## Success Metrics
### Phase 1
- ✅ Basic workflow registration works
- ✅ Ephemeral streams work (in-memory)
- ✅ Broadcast and exclusive subscriptions work
- ✅ gRPC streaming works
- ✅ Zero breaking changes to existing features
### Phase 2
- ✅ Persistent streams work (PostgreSQL)
- ✅ Event replay works from any position
- ✅ Retention policies enforced
- ✅ Consumers can resume from last offset
### Phase 3
- ✅ Exactly-once delivery works (no duplicates)
- ✅ Read receipts work (delivered vs read)
- ✅ Unread timeout handling works
### Phase 4
- ✅ Events flow from Service A to Service B via RabbitMQ
- ✅ Zero RabbitMQ code in handlers
- ✅ Automatic topology creation works
- ✅ Connection resilience works
### Phase 5
- ✅ Old events automatically upcast to new version
- ✅ New consumers receive latest version
- ✅ Multi-hop upcasting works (V1→V2→V3)
### Phase 6
- ✅ Health checks detect lagging consumers
- ✅ Metrics exposed for monitoring
- ✅ Management API works
- ✅ Documentation complete
---
## Risk Mitigation
### Risk: Breaking Existing Features
**Mitigation**:
- Keep `AddCommandWithEvents` for backward compatibility
- Run full test suite after each phase
- Feature flags for new functionality
### Risk: Performance Issues
**Mitigation**:
- Start with in-memory (fast)
- Benchmark at each phase
- Add performance tests before Phase 6
- Use profiling tools
### Risk: Complexity Overload
**Mitigation**:
- Progressive disclosure (simple by default)
- Each phase is independently useful
- Clear documentation at each level
- Sample projects for each complexity level
### Risk: Database Schema Changes
**Mitigation**:
- Use migrations from Phase 2 onward
- Backward-compatible schema changes
- Test migration paths
### Risk: External Dependencies (RabbitMQ, etc.)
**Mitigation**:
- Make external delivery optional
- Provide in-memory fallback
- Docker Compose for development
- Clear setup documentation
---
## Development Guidelines
### Coding Standards
- Use C# 14 features (field keyword, extension members)
- Follow existing patterns in codebase
- XML documentation on public APIs
- Async/await throughout
- CancellationToken support on all async methods
### Testing Strategy
- Unit tests for core logic
- Integration tests for storage implementations
- End-to-end tests for full scenarios
- Performance benchmarks for critical paths
### Documentation Requirements
- XML doc comments on all public APIs
- README updates for each phase
- Sample code for new features
- Architecture diagrams
### Code Review Checklist
- [ ] Follows existing code style
- [ ] Has XML documentation
- [ ] Has unit tests
- [ ] No breaking changes (or documented)
- [ ] Performance acceptable
- [ ] Error handling complete
---
## Timeline Summary
| Phase | Duration | Key Deliverable |
|-------|----------|----------------|
| Phase 1 | 2 weeks | Basic workflows + ephemeral streaming |
| Phase 2 | 2 weeks | Persistent streams + replay |
| Phase 3 | 1 week | Exactly-once + read receipts |
| Phase 4 | 2 weeks | RabbitMQ cross-service |
| Phase 5 | 2 weeks | Schema evolution |
| Phase 6 | 1+ week | Management & monitoring |
| **Total** | **10+ weeks** | **Production-ready event streaming platform** |
---
## Next Steps
1. **Review this plan** - Validate approach and priorities
2. **Create feature branch** - `feature/event-streaming`
3. **Start Phase 1.1** - Workflow abstraction
4. **Iterate rapidly** - Small commits, frequent builds
5. **Update this document** - Check off tasks as completed
---
## Notes & Questions
- [ ] Decision: PostgreSQL or pluggable storage from Phase 2?
- [ ] Decision: gRPC-only or add SignalR for browser support?
- [ ] Decision: Create separate NuGet packages per phase or monolithic?
- [ ] Question: Should we support Kafka in Phase 4 or separate phase?
- [ ] Question: Do we need distributed tracing (OpenTelemetry) integration?
---
**Last Updated**: 2025-12-09
**Status**: Planning Phase - Not Started
**Owner**: Mathias Beaulieu-Duncan

File diff suppressed because it is too large Load Diff