Add comprehensive event streaming implementation plan
- 6 phases covering workflow abstraction through production monitoring - Detailed task breakdowns with checkboxes for tracking - Progressive complexity: simple by default, powerful when needed - Support for ephemeral & persistent streams - Cross-service communication via RabbitMQ - Schema evolution with automatic upcasting - Exactly-once delivery and read receipts - ~10+ weeks timeline with clear success criteria
This commit is contained in:
parent
4051800934
commit
668f6be0d6
977
EVENT-STREAMING-IMPLEMENTATION-PLAN.md
Normal file
977
EVENT-STREAMING-IMPLEMENTATION-PLAN.md
Normal file
@ -0,0 +1,977 @@
|
||||
# Event Streaming Implementation Plan
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Transform the CQRS framework into a complete enterprise event streaming platform that supports:
|
||||
- **Workflows**: Business process correlation and event emission
|
||||
- **Multiple Consumer Patterns**: Broadcast, exclusive, consumer groups, read receipts
|
||||
- **Storage Models**: Ephemeral (message queue) and persistent (event sourcing)
|
||||
- **Delivery Semantics**: At-most-once, at-least-once, exactly-once
|
||||
- **Cross-Service Communication**: RabbitMQ, Kafka integration with zero developer friction
|
||||
- **Schema Evolution**: Event versioning with automatic upcasting
|
||||
- **Event Replay**: Time-travel queries for persistent streams
|
||||
|
||||
**Design Philosophy**: Simple by default, powerful when needed. Progressive complexity.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Layers
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Layer 1: WORKFLOW (Business Process) │
|
||||
│ What events belong together logically? │
|
||||
│ Example: InvitationWorkflow, UserWorkflow │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Layer 2: EVENT STREAM (Organization & Retention) │
|
||||
│ How are events stored and organized? │
|
||||
│ Example: Persistent vs Ephemeral, retention policies │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Layer 3: SUBSCRIPTION (Consumer Routing) │
|
||||
│ Who wants to consume what? │
|
||||
│ Example: Broadcast, Exclusive, ConsumerGroup, ReadReceipt │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Layer 4: DELIVERY (Transport Mechanism) │
|
||||
│ How do events reach consumers? │
|
||||
│ Example: gRPC, RabbitMQ, Kafka │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Core Enumerations
|
||||
|
||||
### StreamType
|
||||
- `Ephemeral`: Message queue semantics (events deleted after consumption)
|
||||
- `Persistent`: Event log semantics (events retained for replay)
|
||||
|
||||
### DeliverySemantics
|
||||
- `AtMostOnce`: Fire and forget (fast, might lose messages)
|
||||
- `AtLeastOnce`: Retry until ack (might see duplicates)
|
||||
- `ExactlyOnce`: Deduplication (slower, no duplicates)
|
||||
|
||||
### SubscriptionMode
|
||||
- `Broadcast`: All consumers get all events (pub/sub)
|
||||
- `Exclusive`: Only one consumer gets each event (queue)
|
||||
- `ConsumerGroup`: Load-balanced across group members (Kafka-style)
|
||||
- `ReadReceipt`: Requires explicit "user saw this" confirmation
|
||||
|
||||
### StreamScope
|
||||
- `Internal`: Same service only (default)
|
||||
- `CrossService`: Available to external services via message broker
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Core Workflow & Streaming Foundation
|
||||
|
||||
**Goal**: Get basic workflow + ephemeral streaming working with in-memory storage
|
||||
|
||||
**Duration**: Weeks 1-2
|
||||
|
||||
### Phase 1 Tasks
|
||||
|
||||
#### 1.1 Workflow Abstraction
|
||||
- [ ] Create `Workflow` abstract base class
|
||||
- [ ] `Id` property (workflow instance identifier)
|
||||
- [ ] `IsNew` property (started vs continued)
|
||||
- [ ] `Emit<TEvent>()` protected method
|
||||
- [ ] Internal `PendingEvents` collection
|
||||
- [ ] Create `ICommandHandlerWithWorkflow<TCommand, TResult, TWorkflow>` interface
|
||||
- [ ] Create `ICommandHandlerWithWorkflow<TCommand, TWorkflow>` interface (no result)
|
||||
- [ ] Update sample: Convert `UserEvent` to `UserWorkflow : Workflow`
|
||||
- [ ] Update sample: Convert `InvitationEvent` to `InvitationWorkflow : Workflow`
|
||||
|
||||
#### 1.2 Stream Configuration
|
||||
- [ ] Create `StreamType` enum
|
||||
- [ ] Create `DeliverySemantics` enum
|
||||
- [ ] Create `SubscriptionMode` enum
|
||||
- [ ] Create `StreamScope` enum
|
||||
- [ ] Create `IStreamConfiguration` interface
|
||||
- [ ] Create `StreamConfiguration` implementation
|
||||
- [ ] Create fluent configuration API: `AddEventStreaming()`
|
||||
|
||||
#### 1.3 In-Memory Storage (Ephemeral)
|
||||
- [ ] Create `IEventStreamStore` interface
|
||||
- [ ] `EnqueueAsync()` for ephemeral streams
|
||||
- [ ] `DequeueAsync()` for ephemeral streams
|
||||
- [ ] `AcknowledgeAsync()` for message acknowledgment
|
||||
- [ ] `NackAsync()` for requeue/dead-letter
|
||||
- [ ] Create `InMemoryEventStreamStore` implementation
|
||||
- [ ] Concurrent queues per stream
|
||||
- [ ] Per-consumer visibility tracking
|
||||
- [ ] Acknowledgment handling
|
||||
- [ ] Create `ISubscriptionStore` interface
|
||||
- [ ] `RegisterConsumerAsync()`
|
||||
- [ ] `UnregisterConsumerAsync()`
|
||||
- [ ] `GetConsumersAsync()`
|
||||
- [ ] Create `InMemorySubscriptionStore` implementation
|
||||
|
||||
#### 1.4 Subscription System
|
||||
- [ ] Create `ISubscription` interface
|
||||
- [ ] Create `Subscription` implementation
|
||||
- [ ] Create `IEventSubscriptionClient` for consumers
|
||||
- [ ] Create `EventSubscriptionClient` implementation
|
||||
- [ ] Implement `Broadcast` mode
|
||||
- [ ] Implement `Exclusive` mode
|
||||
- [ ] Create subscription configuration API
|
||||
|
||||
#### 1.5 Workflow Decorators
|
||||
- [ ] Create `WorkflowContext<TWorkflow>` class
|
||||
- [ ] Create `CommandHandlerWithWorkflowDecorator<TCommand, TResult, TWorkflow>`
|
||||
- [ ] Create `CommandHandlerWithWorkflowDecoratorNoResult<TCommand, TWorkflow>`
|
||||
- [ ] Update event emission to use workflow ID as correlation ID
|
||||
- [ ] Integrate with existing `IEventEmitter`
|
||||
|
||||
#### 1.6 Service Registration
|
||||
- [ ] Create `AddCommandWithWorkflow<TCommand, TResult, TWorkflow, THandler>()` extension
|
||||
- [ ] Create `AddCommandWithWorkflow<TCommand, TResult, TWorkflow, THandler, TValidator>()` extension
|
||||
- [ ] Create `AddCommandWithWorkflow<TCommand, TWorkflow, THandler>()` extension (no result)
|
||||
- [ ] Deprecate `AddCommandWithEvents` (keep for backward compatibility)
|
||||
- [ ] Update `ServiceCollectionExtensions` with workflow registration
|
||||
|
||||
#### 1.7 gRPC Streaming (Basic)
|
||||
- [ ] Create `IEventDeliveryProvider` interface
|
||||
- [ ] Create `GrpcEventDeliveryProvider` implementation
|
||||
- [ ] Update gRPC service to support bidirectional streaming
|
||||
- [ ] Implement consumer registration/unregistration
|
||||
- [ ] Handle connection lifecycle (connect/disconnect/reconnect)
|
||||
|
||||
#### 1.8 Sample Project Updates
|
||||
- [ ] Refactor `UserEvents.cs` → `UserWorkflow.cs`
|
||||
- [ ] Refactor `InvitationWorkflow.cs` to use new API
|
||||
- [ ] Update `Program.cs` with workflow registration
|
||||
- [ ] Add simple subscription consumer example
|
||||
- [ ] Add gRPC streaming consumer example
|
||||
- [ ] Update documentation
|
||||
|
||||
#### 1.9 Testing & Validation
|
||||
- [ ] Build and verify no regressions
|
||||
- [ ] Test workflow start/continue semantics
|
||||
- [ ] Test ephemeral stream (message queue behavior)
|
||||
- [ ] Test broadcast subscription (multiple consumers)
|
||||
- [ ] Test exclusive subscription (single consumer)
|
||||
- [ ] Test gRPC streaming connection
|
||||
- [ ] Verify existing features still work
|
||||
|
||||
**Phase 1 Success Criteria:**
|
||||
```csharp
|
||||
✅ This should work:
|
||||
|
||||
// Registration
|
||||
builder.Services.AddCommandWithWorkflow<InviteUserCommand, string, InvitationWorkflow, InviteUserCommandHandler>();
|
||||
|
||||
// Handler
|
||||
public class InviteUserCommandHandler
|
||||
: ICommandHandlerWithWorkflow<InviteUserCommand, string, InvitationWorkflow>
|
||||
{
|
||||
public async Task<string> HandleAsync(
|
||||
InviteUserCommand command,
|
||||
InvitationWorkflow workflow,
|
||||
CancellationToken ct)
|
||||
{
|
||||
workflow.Emit(new UserInvitedEvent { ... });
|
||||
return workflow.Id;
|
||||
}
|
||||
}
|
||||
|
||||
// Consumer
|
||||
await foreach (var @event in client.SubscribeAsync("my-subscription", "consumer-1", ct))
|
||||
{
|
||||
Console.WriteLine($"Received: {@event}");
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Persistence & Event Sourcing
|
||||
|
||||
**Goal**: Add persistent streams with replay capability
|
||||
|
||||
**Duration**: Weeks 3-4
|
||||
|
||||
### Phase 2 Tasks
|
||||
|
||||
#### 2.1 Storage Abstractions (Persistent)
|
||||
- [ ] Extend `IEventStreamStore` with append-only log methods:
|
||||
- [ ] `AppendAsync()` for persistent streams
|
||||
- [ ] `ReadStreamAsync()` for reading event log
|
||||
- [ ] `GetStreamLengthAsync()` for stream metadata
|
||||
- [ ] Create `StoredEvent` record (offset, timestamp, event data)
|
||||
- [ ] Create `StreamMetadata` record (length, retention, oldest event)
|
||||
|
||||
#### 2.2 PostgreSQL Storage Implementation
|
||||
- [ ] Create `PostgresEventStreamStore : IEventStreamStore`
|
||||
- [ ] Design event log schema:
|
||||
- [ ] `events` table (stream_name, offset, event_type, event_data, correlation_id, timestamp)
|
||||
- [ ] Indexes for efficient queries
|
||||
- [ ] Partition strategy for large streams
|
||||
- [ ] Implement append operations with optimistic concurrency
|
||||
- [ ] Implement read operations with offset-based pagination
|
||||
- [ ] Implement queue operations for ephemeral streams
|
||||
|
||||
#### 2.3 Offset Tracking
|
||||
- [ ] Create `IConsumerOffsetStore` interface
|
||||
- [ ] `GetOffsetAsync(subscriptionId, consumerId)`
|
||||
- [ ] `SetOffsetAsync(subscriptionId, consumerId, offset)`
|
||||
- [ ] `GetConsumerPositionsAsync(subscriptionId)` (for monitoring)
|
||||
- [ ] Create `PostgresConsumerOffsetStore` implementation
|
||||
- [ ] Design offset tracking schema:
|
||||
- [ ] `consumer_offsets` table (subscription_id, consumer_id, stream_offset, last_updated)
|
||||
- [ ] Integrate offset tracking with subscription client
|
||||
|
||||
#### 2.4 Retention Policies
|
||||
- [ ] Create `RetentionPolicy` configuration
|
||||
- [ ] Time-based retention (e.g., 90 days)
|
||||
- [ ] Size-based retention (e.g., 10GB max)
|
||||
- [ ] Count-based retention (e.g., 1M events max)
|
||||
- [ ] Create `IRetentionService` interface
|
||||
- [ ] Create `RetentionService` background service
|
||||
- [ ] Implement retention policy enforcement
|
||||
- [ ] Add configurable cleanup intervals
|
||||
|
||||
#### 2.5 Event Replay API
|
||||
- [ ] Create `IEventReplayService` interface
|
||||
- [ ] Create `EventReplayService` implementation
|
||||
- [ ] Create `ReplayOptions` configuration:
|
||||
- [ ] `StartPosition` (Beginning, Offset, Timestamp, EventId)
|
||||
- [ ] `EndPosition` (Latest, Offset, Timestamp, EventId)
|
||||
- [ ] `Filter` predicate
|
||||
- [ ] `MaxEvents` limit
|
||||
- [ ] Implement replay from persistent streams
|
||||
- [ ] Add replay to new consumer (catch-up subscription)
|
||||
|
||||
#### 2.6 Stream Configuration Extensions
|
||||
- [ ] Extend stream configuration with:
|
||||
- [ ] `Type = StreamType.Persistent`
|
||||
- [ ] `Retention` policies
|
||||
- [ ] `EnableReplay = true/false`
|
||||
- [ ] Validate configuration (ephemeral can't have replay)
|
||||
- [ ] Add stream type detection and routing
|
||||
|
||||
#### 2.7 Migration & Compatibility
|
||||
- [ ] Create database migration scripts
|
||||
- [ ] Add backward compatibility for in-memory implementation
|
||||
- [ ] Allow mixing persistent and ephemeral streams
|
||||
- [ ] Support runtime switching (development vs production)
|
||||
|
||||
#### 2.8 Testing
|
||||
- [ ] Test persistent stream append/read
|
||||
- [ ] Test offset tracking across restarts
|
||||
- [ ] Test retention policy enforcement
|
||||
- [ ] Test event replay from various positions
|
||||
- [ ] Test catch-up subscriptions
|
||||
- [ ] Stress test with large event volumes
|
||||
|
||||
**Phase 2 Success Criteria:**
|
||||
```csharp
|
||||
✅ This should work:
|
||||
|
||||
// Configure persistent stream
|
||||
builder.Services.AddEventStreaming(streaming =>
|
||||
{
|
||||
streaming.AddStream<UserWorkflow>(stream =>
|
||||
{
|
||||
stream.Type = StreamType.Persistent;
|
||||
stream.Retention = TimeSpan.FromDays(90);
|
||||
stream.EnableReplay = true;
|
||||
});
|
||||
});
|
||||
|
||||
// Use PostgreSQL storage
|
||||
services.AddSingleton<IEventStreamStore, PostgresEventStreamStore>();
|
||||
|
||||
// Replay events
|
||||
var replay = await replayService.ReplayStreamAsync("user-events", new ReplayOptions
|
||||
{
|
||||
From = new StartPosition.Timestamp(DateTimeOffset.UtcNow.AddDays(-7))
|
||||
}, ct);
|
||||
|
||||
await foreach (var @event in replay)
|
||||
{
|
||||
// Process historical events
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Exactly-Once Delivery & Read Receipts
|
||||
|
||||
**Goal**: Add deduplication and explicit user confirmation
|
||||
|
||||
**Duration**: Week 5
|
||||
|
||||
### Phase 3 Tasks
|
||||
|
||||
#### 3.1 Idempotency Store
|
||||
- [ ] Create `IIdempotencyStore` interface
|
||||
- [ ] `WasProcessedAsync(consumerId, eventId)`
|
||||
- [ ] `MarkProcessedAsync(consumerId, eventId, processedAt)`
|
||||
- [ ] `TryAcquireIdempotencyLockAsync(idempotencyKey, lockDuration)`
|
||||
- [ ] `ReleaseIdempotencyLockAsync(idempotencyKey)`
|
||||
- [ ] `CleanupAsync(olderThan)`
|
||||
- [ ] Create `PostgresIdempotencyStore` implementation
|
||||
- [ ] Design idempotency schema:
|
||||
- [ ] `processed_events` table (consumer_id, event_id, processed_at)
|
||||
- [ ] `idempotency_locks` table (lock_key, acquired_at, expires_at)
|
||||
- [ ] Add TTL-based cleanup
|
||||
|
||||
#### 3.2 Exactly-Once Middleware
|
||||
- [ ] Create `ExactlyOnceDeliveryDecorator`
|
||||
- [ ] Implement duplicate detection
|
||||
- [ ] Implement distributed locking
|
||||
- [ ] Add automatic retry on lock contention
|
||||
- [ ] Integrate with subscription pipeline
|
||||
|
||||
#### 3.3 Read Receipt Store
|
||||
- [ ] Create `IReadReceiptStore` interface
|
||||
- [ ] `MarkDeliveredAsync(subscriptionId, consumerId, eventId, deliveredAt)`
|
||||
- [ ] `MarkReadAsync(subscriptionId, consumerId, eventId, readAt)`
|
||||
- [ ] `GetUnreadEventsAsync(subscriptionId, consumerId)`
|
||||
- [ ] `GetExpiredUnreadEventsAsync(timeout)`
|
||||
- [ ] Create `PostgresReadReceiptStore` implementation
|
||||
- [ ] Design read receipt schema:
|
||||
- [ ] `read_receipts` table (subscription_id, consumer_id, event_id, delivered_at, read_at, status)
|
||||
|
||||
#### 3.4 Read Receipt API
|
||||
- [ ] Extend `IEventSubscriptionClient` with:
|
||||
- [ ] `MarkAsReadAsync(eventId)`
|
||||
- [ ] `MarkAllAsReadAsync()`
|
||||
- [ ] `GetUnreadCountAsync()`
|
||||
- [ ] Create `ReadReceiptEvent` wrapper with `.MarkAsReadAsync()` method
|
||||
- [ ] Implement unread timeout handling
|
||||
- [ ] Add dead letter queue for expired unread events
|
||||
|
||||
#### 3.5 Configuration
|
||||
- [ ] Extend stream configuration with:
|
||||
- [ ] `DeliverySemantics = DeliverySemantics.ExactlyOnce`
|
||||
- [ ] Extend subscription configuration with:
|
||||
- [ ] `Mode = SubscriptionMode.ReadReceipt`
|
||||
- [ ] `OnUnreadTimeout` duration
|
||||
- [ ] `OnUnreadExpired` policy (Requeue, DeadLetter, Drop)
|
||||
- [ ] Add validation for configuration combinations
|
||||
|
||||
#### 3.6 Monitoring & Cleanup
|
||||
- [ ] Create background service for unread timeout detection
|
||||
- [ ] Add metrics for unread events per consumer
|
||||
- [ ] Add health checks for lagging consumers
|
||||
- [ ] Implement automatic cleanup of old processed events
|
||||
|
||||
#### 3.7 Testing
|
||||
- [ ] Test duplicate event detection
|
||||
- [ ] Test concurrent processing with locking
|
||||
- [ ] Test read receipt lifecycle (delivered → read)
|
||||
- [ ] Test unread timeout handling
|
||||
- [ ] Test exactly-once guarantees under failure
|
||||
|
||||
**Phase 3 Success Criteria:**
|
||||
```csharp
|
||||
✅ This should work:
|
||||
|
||||
// Exactly-once delivery
|
||||
builder.Services.AddEventStreaming(streaming =>
|
||||
{
|
||||
streaming.AddStream<UserWorkflow>(stream =>
|
||||
{
|
||||
stream.Type = StreamType.Persistent;
|
||||
stream.DeliverySemantics = DeliverySemantics.ExactlyOnce;
|
||||
});
|
||||
});
|
||||
|
||||
// Read receipts
|
||||
streaming.AddSubscription("admin-notifications", subscription =>
|
||||
{
|
||||
subscription.ToStream<UserWorkflow>();
|
||||
subscription.Mode = SubscriptionMode.ReadReceipt;
|
||||
subscription.OnUnreadTimeout = TimeSpan.FromHours(24);
|
||||
});
|
||||
|
||||
// Consumer
|
||||
await foreach (var notification in client.SubscribeAsync("admin-notifications", "admin-123", ct))
|
||||
{
|
||||
await ShowNotificationAsync(notification);
|
||||
await notification.MarkAsReadAsync(); // Explicit confirmation
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Cross-Service Communication (RabbitMQ)
|
||||
|
||||
**Goal**: Enable event streaming across different services via RabbitMQ with zero developer friction
|
||||
|
||||
**Duration**: Weeks 6-7
|
||||
|
||||
### Phase 4 Tasks
|
||||
|
||||
#### 4.1 External Delivery Abstraction
|
||||
- [ ] Extend `IEventDeliveryProvider` with:
|
||||
- [ ] `PublishExternalAsync(streamName, event, metadata)`
|
||||
- [ ] `SubscribeExternalAsync(streamName, subscriptionId, consumerId)`
|
||||
- [ ] Create `ExternalDeliveryConfiguration`
|
||||
- [ ] Add provider registration API
|
||||
|
||||
#### 4.2 RabbitMQ Provider
|
||||
- [ ] Create `RabbitMqEventDeliveryProvider : IEventDeliveryProvider`
|
||||
- [ ] Create `RabbitMqConfiguration`:
|
||||
- [ ] Connection string
|
||||
- [ ] Exchange prefix
|
||||
- [ ] Exchange type (topic, fanout, direct)
|
||||
- [ ] Routing key strategy
|
||||
- [ ] Auto-declare topology
|
||||
- [ ] Implement connection management (connect, reconnect, dispose)
|
||||
- [ ] Implement publish operations
|
||||
- [ ] Implement subscribe operations
|
||||
- [ ] Add NuGet dependency: `RabbitMQ.Client`
|
||||
|
||||
#### 4.3 Topology Management
|
||||
- [ ] Create `IRabbitMqTopologyManager` interface
|
||||
- [ ] Implement automatic exchange creation:
|
||||
- [ ] Format: `{prefix}.{stream-name}` (e.g., `myapp.user-events`)
|
||||
- [ ] Type: topic exchange (default)
|
||||
- [ ] Implement automatic queue creation:
|
||||
- [ ] Broadcast: `{prefix}.{subscription-id}.{consumer-id}`
|
||||
- [ ] Exclusive: `{prefix}.{subscription-id}`
|
||||
- [ ] ConsumerGroup: `{prefix}.{subscription-id}`
|
||||
- [ ] Implement automatic binding creation:
|
||||
- [ ] Routing keys based on event type names
|
||||
- [ ] Add validation for valid names (no spaces, special chars)
|
||||
|
||||
#### 4.4 Remote Stream Configuration
|
||||
- [ ] Create `IRemoteStreamConfiguration` interface
|
||||
- [ ] Create fluent API: `AddRemoteStream(name, config)`
|
||||
- [ ] Implement remote stream subscription
|
||||
- [ ] Add cross-service event routing
|
||||
|
||||
#### 4.5 Message Serialization
|
||||
- [ ] Create `IEventSerializer` interface
|
||||
- [ ] Create `JsonEventSerializer` implementation
|
||||
- [ ] Add event type metadata in message headers:
|
||||
- [ ] `event-type` (CLR type name)
|
||||
- [ ] `event-version` (schema version)
|
||||
- [ ] `correlation-id`
|
||||
- [ ] `timestamp`
|
||||
- [ ] Implement deserialization with type resolution
|
||||
|
||||
#### 4.6 Acknowledgment & Redelivery
|
||||
- [ ] Implement manual acknowledgment (ack)
|
||||
- [ ] Implement negative acknowledgment (nack) with requeue
|
||||
- [ ] Add dead letter queue configuration
|
||||
- [ ] Implement retry policies (exponential backoff)
|
||||
- [ ] Add max retry count
|
||||
|
||||
#### 4.7 Connection Resilience
|
||||
- [ ] Implement automatic reconnection on failure
|
||||
- [ ] Add connection health checks
|
||||
- [ ] Implement circuit breaker pattern
|
||||
- [ ] Add connection pool management
|
||||
- [ ] Log connection events (connected, disconnected, reconnecting)
|
||||
|
||||
#### 4.8 Cross-Service Sample
|
||||
- [ ] Create second sample project: `Svrnty.Sample.Analytics`
|
||||
- [ ] Configure Service A to publish to RabbitMQ
|
||||
- [ ] Configure Service B to consume from RabbitMQ
|
||||
- [ ] Demonstrate cross-service event flow
|
||||
- [ ] Add docker-compose with RabbitMQ
|
||||
|
||||
#### 4.9 Testing
|
||||
- [ ] Test exchange/queue creation
|
||||
- [ ] Test message publishing
|
||||
- [ ] Test message consumption
|
||||
- [ ] Test acknowledgment handling
|
||||
- [ ] Test connection failure recovery
|
||||
- [ ] Test dead letter queue
|
||||
- [ ] Integration test across two services
|
||||
|
||||
**Phase 4 Success Criteria:**
|
||||
```csharp
|
||||
✅ This should work:
|
||||
|
||||
// Service A: Publish events externally
|
||||
builder.Services.AddEventStreaming(streaming =>
|
||||
{
|
||||
streaming.AddStream<UserWorkflow>(stream =>
|
||||
{
|
||||
stream.Type = StreamType.Persistent;
|
||||
stream.Scope = StreamScope.CrossService;
|
||||
stream.ExternalDelivery.UseRabbitMq(rabbitmq =>
|
||||
{
|
||||
rabbitmq.ConnectionString = "amqp://localhost";
|
||||
rabbitmq.ExchangeName = "user-service.events";
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
// Service B: Consume from Service A
|
||||
builder.Services.AddEventStreaming(streaming =>
|
||||
{
|
||||
streaming.AddRemoteStream("user-service.events", remote =>
|
||||
{
|
||||
remote.UseRabbitMq(rabbitmq =>
|
||||
{
|
||||
rabbitmq.ConnectionString = "amqp://localhost";
|
||||
});
|
||||
});
|
||||
|
||||
streaming.AddSubscription("analytics", subscription =>
|
||||
{
|
||||
subscription.ToRemoteStream("user-service.events");
|
||||
subscription.Mode = SubscriptionMode.ConsumerGroup;
|
||||
});
|
||||
});
|
||||
|
||||
// Zero RabbitMQ knowledge needed by developer!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Schema Evolution & Versioning
|
||||
|
||||
**Goal**: Support event versioning with automatic upcasting
|
||||
|
||||
**Duration**: Weeks 8-9
|
||||
|
||||
### Phase 5 Tasks
|
||||
|
||||
#### 5.1 Schema Registry Abstractions
|
||||
- [ ] Create `ISchemaRegistry` interface
|
||||
- [ ] `RegisterSchemaAsync<TEvent>(version, upcastFromType)`
|
||||
- [ ] `GetSchemaAsync(eventType, version)`
|
||||
- [ ] `GetSchemaHistoryAsync(eventType)`
|
||||
- [ ] `UpcastAsync(event, targetVersion)`
|
||||
- [ ] Create `SchemaInfo` record (version, CLR type, JSON schema, upcast info)
|
||||
- [ ] Create `ISchemaStore` interface for persistence
|
||||
|
||||
#### 5.2 Event Versioning Attributes
|
||||
- [ ] Create `[EventVersion(int)]` attribute
|
||||
- [ ] Create `[EventVersionAttribute]` with:
|
||||
- [ ] `Version` property
|
||||
- [ ] `UpcastFrom` type property
|
||||
- [ ] Add compile-time validation (via analyzer if time permits)
|
||||
|
||||
#### 5.3 Schema Registry Implementation
|
||||
- [ ] Create `SchemaRegistry : ISchemaRegistry`
|
||||
- [ ] Create `PostgresSchemaStore : ISchemaStore`
|
||||
- [ ] Design schema storage:
|
||||
- [ ] `event_schemas` table (event_type, version, clr_type, json_schema, upcast_from_type, registered_at)
|
||||
- [ ] Implement version registration
|
||||
- [ ] Implement schema lookup with caching
|
||||
|
||||
#### 5.4 Upcasting Pipeline
|
||||
- [ ] Create `IEventUpcaster<TFrom, TTo>` interface
|
||||
- [ ] Create `EventUpcastingMiddleware`
|
||||
- [ ] Implement automatic upcaster discovery:
|
||||
- [ ] Via static method: `TTo.UpcastFrom(TFrom)`
|
||||
- [ ] Via registered `IEventUpcaster<TFrom, TTo>` implementations
|
||||
- [ ] Implement multi-hop upcasting (V1 → V2 → V3)
|
||||
- [ ] Add upcasting to subscription pipeline
|
||||
|
||||
#### 5.5 JSON Schema Generation
|
||||
- [ ] Create `IJsonSchemaGenerator` interface
|
||||
- [ ] Create `JsonSchemaGenerator` implementation
|
||||
- [ ] Generate JSON Schema from CLR types
|
||||
- [ ] Store schemas in registry for external consumers
|
||||
- [ ] Add schema validation (optional)
|
||||
|
||||
#### 5.6 Configuration
|
||||
- [ ] Extend stream configuration with:
|
||||
- [ ] `EnableSchemaEvolution = true/false`
|
||||
- [ ] `SchemaRegistry` configuration
|
||||
- [ ] Add fluent API for schema registration:
|
||||
- [ ] `registry.Register<TEvent>(version)`
|
||||
- [ ] `registry.Register<TEvent>(version, upcastFrom: typeof(TOldEvent))`
|
||||
- [ ] Extend subscription configuration:
|
||||
- [ ] `ReceiveAs<TEventVersion>()` to specify target version
|
||||
|
||||
#### 5.7 Backward Compatibility
|
||||
- [ ] Handle events without version attribute (default to version 1)
|
||||
- [ ] Support mixed versioned/unversioned events
|
||||
- [ ] Add migration path for existing events
|
||||
|
||||
#### 5.8 Testing
|
||||
- [ ] Test version registration
|
||||
- [ ] Test single-hop upcasting (V1 → V2)
|
||||
- [ ] Test multi-hop upcasting (V1 → V2 → V3)
|
||||
- [ ] Test new consumers receiving old events (auto-upcast)
|
||||
- [ ] Test schema storage and retrieval
|
||||
- [ ] Test JSON schema generation
|
||||
|
||||
**Phase 5 Success Criteria:**
|
||||
```csharp
|
||||
✅ This should work:
|
||||
|
||||
// Event V1
|
||||
[EventVersion(1)]
|
||||
public sealed record UserAddedEventV1 : UserWorkflow
|
||||
{
|
||||
public required int UserId { get; init; }
|
||||
public required string Name { get; init; }
|
||||
}
|
||||
|
||||
// Event V2 with upcaster
|
||||
[EventVersion(2, UpcastFrom = typeof(UserAddedEventV1))]
|
||||
public sealed record UserAddedEventV2 : UserWorkflow
|
||||
{
|
||||
public required int UserId { get; init; }
|
||||
public required string FirstName { get; init; }
|
||||
public required string LastName { get; init; }
|
||||
public required string Email { get; init; }
|
||||
|
||||
public static UserAddedEventV2 UpcastFrom(UserAddedEventV1 v1)
|
||||
{
|
||||
var names = v1.Name.Split(' ', 2);
|
||||
return new UserAddedEventV2
|
||||
{
|
||||
UserId = v1.UserId,
|
||||
FirstName = names[0],
|
||||
LastName = names.Length > 1 ? names[1] : "",
|
||||
Email = $"user{v1.UserId}@unknown.com"
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Configuration
|
||||
streaming.UseSchemaRegistry(registry =>
|
||||
{
|
||||
registry.Register<UserAddedEventV1>(version: 1);
|
||||
registry.Register<UserAddedEventV2>(version: 2, upcastFrom: typeof(UserAddedEventV1));
|
||||
});
|
||||
|
||||
// Consumer always receives V2 (framework auto-upcasts V1 → V2)
|
||||
streaming.AddSubscription("analytics", subscription =>
|
||||
{
|
||||
subscription.ToStream<UserWorkflow>();
|
||||
subscription.ReceiveAs<UserAddedEventV2>();
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Management, Monitoring & Observability
|
||||
|
||||
**Goal**: Production-ready monitoring, health checks, and management APIs
|
||||
|
||||
**Duration**: Week 10+
|
||||
|
||||
### Phase 6 Tasks
|
||||
|
||||
#### 6.1 Health Checks
|
||||
- [ ] Create `IStreamHealthCheck` interface
|
||||
- [ ] Implement stream health checks:
|
||||
- [ ] Stream exists and is writable
|
||||
- [ ] Consumer lag detection (offset vs stream length)
|
||||
- [ ] Stalled consumer detection (no progress for N minutes)
|
||||
- [ ] Integrate with ASP.NET Core health checks
|
||||
- [ ] Add health check endpoints
|
||||
|
||||
#### 6.2 Metrics & Telemetry
|
||||
- [ ] Define key metrics:
|
||||
- [ ] Events published per stream (rate)
|
||||
- [ ] Events consumed per subscription (rate)
|
||||
- [ ] Consumer lag (offset delta)
|
||||
- [ ] Processing latency (time from publish to ack)
|
||||
- [ ] Error rate
|
||||
- [ ] Integrate with OpenTelemetry
|
||||
- [ ] Add Prometheus endpoint
|
||||
- [ ] Create Grafana dashboard templates
|
||||
|
||||
#### 6.3 Management API
|
||||
- [ ] Create REST API for management:
|
||||
- [ ] `GET /api/streams` - List all streams
|
||||
- [ ] `GET /api/streams/{name}` - Get stream details
|
||||
- [ ] `GET /api/streams/{name}/subscriptions` - List subscriptions
|
||||
- [ ] `GET /api/subscriptions/{id}/consumers` - List consumers
|
||||
- [ ] `GET /api/subscriptions/{id}/consumers/{consumerId}/offset` - Get consumer position
|
||||
- [ ] `POST /api/subscriptions/{id}/consumers/{consumerId}/reset-offset` - Reset offset
|
||||
- [ ] `DELETE /api/subscriptions/{id}/consumers/{consumerId}` - Remove consumer
|
||||
- [ ] Add authorization (admin only)
|
||||
- [ ] Add Swagger documentation
|
||||
|
||||
#### 6.4 Admin Dashboard (Optional)
|
||||
- [ ] Create simple web UI for monitoring:
|
||||
- [ ] Stream list with event counts
|
||||
- [ ] Subscription list with consumer status
|
||||
- [ ] Consumer lag visualization
|
||||
- [ ] Event replay interface
|
||||
- [ ] Use Blazor or simple HTML/JS
|
||||
|
||||
#### 6.5 Logging
|
||||
- [ ] Add structured logging with Serilog/NLog
|
||||
- [ ] Log key events:
|
||||
- [ ] Stream created
|
||||
- [ ] Consumer registered/unregistered
|
||||
- [ ] Event published
|
||||
- [ ] Event consumed
|
||||
- [ ] Errors and retries
|
||||
- [ ] Add correlation IDs to all logs
|
||||
- [ ] Add log levels (Debug, Info, Warning, Error)
|
||||
|
||||
#### 6.6 Alerting (Optional)
|
||||
- [ ] Define alerting rules:
|
||||
- [ ] Consumer lag exceeds threshold
|
||||
- [ ] Consumer stalled (no progress)
|
||||
- [ ] Error rate spike
|
||||
- [ ] Dead letter queue growth
|
||||
- [ ] Integration with alerting systems (email, Slack, PagerDuty)
|
||||
|
||||
#### 6.7 Documentation
|
||||
- [ ] Update CLAUDE.md with event streaming documentation
|
||||
- [ ] Create developer guide
|
||||
- [ ] Create deployment guide
|
||||
- [ ] Create troubleshooting guide
|
||||
- [ ] Add API reference documentation
|
||||
- [ ] Create architecture diagrams
|
||||
|
||||
#### 6.8 Testing
|
||||
- [ ] Test health check endpoints
|
||||
- [ ] Test metrics collection
|
||||
- [ ] Test management API
|
||||
- [ ] Load testing (throughput, latency)
|
||||
- [ ] Chaos testing (failure scenarios)
|
||||
|
||||
**Phase 6 Success Criteria:**
|
||||
```csharp
|
||||
✅ Production-ready features:
|
||||
|
||||
// Health checks
|
||||
builder.Services.AddHealthChecks()
|
||||
.AddEventStreamHealthCheck();
|
||||
|
||||
// Metrics exposed at /metrics
|
||||
builder.Services.AddEventStreaming(streaming =>
|
||||
{
|
||||
streaming.EnableMetrics();
|
||||
streaming.EnableHealthChecks();
|
||||
});
|
||||
|
||||
// Management API available
|
||||
// GET /api/streams → List all streams
|
||||
// GET /api/streams/user-events/subscriptions → View subscriptions
|
||||
// POST /api/subscriptions/admin-notifications/consumers/admin-123/reset-offset → Reset lag
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Optional Future Phases
|
||||
|
||||
### Phase 7: Advanced Features (Post-Launch)
|
||||
- [ ] Kafka provider implementation
|
||||
- [ ] Azure Service Bus provider
|
||||
- [ ] AWS SQS/SNS provider
|
||||
- [ ] Saga orchestration support
|
||||
- [ ] Event sourcing projections
|
||||
- [ ] Snapshot support for aggregates
|
||||
- [ ] CQRS read model synchronization
|
||||
- [ ] GraphQL subscriptions integration
|
||||
- [ ] SignalR integration for browser clients
|
||||
|
||||
### Phase 8: Performance Optimizations
|
||||
- [ ] Batch processing support
|
||||
- [ ] Stream partitioning
|
||||
- [ ] Parallel consumer processing
|
||||
- [ ] Event compression
|
||||
- [ ] Connection pooling
|
||||
- [ ] Query optimization
|
||||
|
||||
---
|
||||
|
||||
## Design Decisions & Rationale
|
||||
|
||||
### Why Workflows Over Events?
|
||||
**Decision**: Make workflows the primary abstraction, not events.
|
||||
|
||||
**Rationale**:
|
||||
- Workflows represent business processes (how developers think)
|
||||
- Events are implementation details of workflows
|
||||
- Clearer intent: "This command participates in an invitation workflow"
|
||||
- Solves correlation problem elegantly (workflow ID = correlation ID)
|
||||
|
||||
### Why Support Both Ephemeral & Persistent?
|
||||
**Decision**: Support both message queue (ephemeral) and event sourcing (persistent) patterns.
|
||||
|
||||
**Rationale**:
|
||||
- Different use cases have different needs
|
||||
- Ephemeral: Simple notifications, no need for history
|
||||
- Persistent: Audit logs, analytics, replay capability
|
||||
- Developer chooses based on requirements
|
||||
- Same API for both (progressive complexity)
|
||||
|
||||
### Why Exactly-Once Opt-In?
|
||||
**Decision**: Make exactly-once delivery optional, default to at-least-once.
|
||||
|
||||
**Rationale**:
|
||||
- Exactly-once has performance cost (deduplication, locking)
|
||||
- Most scenarios can handle duplicates (idempotent handlers)
|
||||
- Developer opts in when critical (financial transactions)
|
||||
- Simpler default behavior
|
||||
|
||||
### Why Cross-Service Opt-In?
|
||||
**Decision**: Streams are internal by default, external requires explicit configuration.
|
||||
|
||||
**Rationale**:
|
||||
- Security: Don't expose events externally by accident
|
||||
- Performance: Internal delivery (gRPC) is faster
|
||||
- Simplicity: Most services don't need cross-service events
|
||||
- Developer explicitly chooses when needed
|
||||
|
||||
### Why Schema Evolution?
|
||||
**Decision**: Support event versioning from the start.
|
||||
|
||||
**Rationale**:
|
||||
- Events are long-lived (years in persistent streams)
|
||||
- Schema changes are inevitable
|
||||
- Breaking changes hurt (can't deserialize old events)
|
||||
- Automatic upcasting prevents data loss
|
||||
- Essential for persistent streams with replay
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### Phase 1
|
||||
- ✅ Basic workflow registration works
|
||||
- ✅ Ephemeral streams work (in-memory)
|
||||
- ✅ Broadcast and exclusive subscriptions work
|
||||
- ✅ gRPC streaming works
|
||||
- ✅ Zero breaking changes to existing features
|
||||
|
||||
### Phase 2
|
||||
- ✅ Persistent streams work (PostgreSQL)
|
||||
- ✅ Event replay works from any position
|
||||
- ✅ Retention policies enforced
|
||||
- ✅ Consumers can resume from last offset
|
||||
|
||||
### Phase 3
|
||||
- ✅ Exactly-once delivery works (no duplicates)
|
||||
- ✅ Read receipts work (delivered vs read)
|
||||
- ✅ Unread timeout handling works
|
||||
|
||||
### Phase 4
|
||||
- ✅ Events flow from Service A to Service B via RabbitMQ
|
||||
- ✅ Zero RabbitMQ code in handlers
|
||||
- ✅ Automatic topology creation works
|
||||
- ✅ Connection resilience works
|
||||
|
||||
### Phase 5
|
||||
- ✅ Old events automatically upcast to new version
|
||||
- ✅ New consumers receive latest version
|
||||
- ✅ Multi-hop upcasting works (V1→V2→V3)
|
||||
|
||||
### Phase 6
|
||||
- ✅ Health checks detect lagging consumers
|
||||
- ✅ Metrics exposed for monitoring
|
||||
- ✅ Management API works
|
||||
- ✅ Documentation complete
|
||||
|
||||
---
|
||||
|
||||
## Risk Mitigation
|
||||
|
||||
### Risk: Breaking Existing Features
|
||||
**Mitigation**:
|
||||
- Keep `AddCommandWithEvents` for backward compatibility
|
||||
- Run full test suite after each phase
|
||||
- Feature flags for new functionality
|
||||
|
||||
### Risk: Performance Issues
|
||||
**Mitigation**:
|
||||
- Start with in-memory (fast)
|
||||
- Benchmark at each phase
|
||||
- Add performance tests before Phase 6
|
||||
- Use profiling tools
|
||||
|
||||
### Risk: Complexity Overload
|
||||
**Mitigation**:
|
||||
- Progressive disclosure (simple by default)
|
||||
- Each phase is independently useful
|
||||
- Clear documentation at each level
|
||||
- Sample projects for each complexity level
|
||||
|
||||
### Risk: Database Schema Changes
|
||||
**Mitigation**:
|
||||
- Use migrations from Phase 2 onward
|
||||
- Backward-compatible schema changes
|
||||
- Test migration paths
|
||||
|
||||
### Risk: External Dependencies (RabbitMQ, etc.)
|
||||
**Mitigation**:
|
||||
- Make external delivery optional
|
||||
- Provide in-memory fallback
|
||||
- Docker Compose for development
|
||||
- Clear setup documentation
|
||||
|
||||
---
|
||||
|
||||
## Development Guidelines
|
||||
|
||||
### Coding Standards
|
||||
- Use C# 14 features (field keyword, extension members)
|
||||
- Follow existing patterns in codebase
|
||||
- XML documentation on public APIs
|
||||
- Async/await throughout
|
||||
- CancellationToken support on all async methods
|
||||
|
||||
### Testing Strategy
|
||||
- Unit tests for core logic
|
||||
- Integration tests for storage implementations
|
||||
- End-to-end tests for full scenarios
|
||||
- Performance benchmarks for critical paths
|
||||
|
||||
### Documentation Requirements
|
||||
- XML doc comments on all public APIs
|
||||
- README updates for each phase
|
||||
- Sample code for new features
|
||||
- Architecture diagrams
|
||||
|
||||
### Code Review Checklist
|
||||
- [ ] Follows existing code style
|
||||
- [ ] Has XML documentation
|
||||
- [ ] Has unit tests
|
||||
- [ ] No breaking changes (or documented)
|
||||
- [ ] Performance acceptable
|
||||
- [ ] Error handling complete
|
||||
|
||||
---
|
||||
|
||||
## Timeline Summary
|
||||
|
||||
| Phase | Duration | Key Deliverable |
|
||||
|-------|----------|----------------|
|
||||
| Phase 1 | 2 weeks | Basic workflows + ephemeral streaming |
|
||||
| Phase 2 | 2 weeks | Persistent streams + replay |
|
||||
| Phase 3 | 1 week | Exactly-once + read receipts |
|
||||
| Phase 4 | 2 weeks | RabbitMQ cross-service |
|
||||
| Phase 5 | 2 weeks | Schema evolution |
|
||||
| Phase 6 | 1+ week | Management & monitoring |
|
||||
| **Total** | **10+ weeks** | **Production-ready event streaming platform** |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Review this plan** - Validate approach and priorities
|
||||
2. **Create feature branch** - `feature/event-streaming`
|
||||
3. **Start Phase 1.1** - Workflow abstraction
|
||||
4. **Iterate rapidly** - Small commits, frequent builds
|
||||
5. **Update this document** - Check off tasks as completed
|
||||
|
||||
---
|
||||
|
||||
## Notes & Questions
|
||||
|
||||
- [ ] Decision: PostgreSQL or pluggable storage from Phase 2?
|
||||
- [ ] Decision: gRPC-only or add SignalR for browser support?
|
||||
- [ ] Decision: Create separate NuGet packages per phase or monolithic?
|
||||
- [ ] Question: Should we support Kafka in Phase 4 or separate phase?
|
||||
- [ ] Question: Do we need distributed tracing (OpenTelemetry) integration?
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-09
|
||||
**Status**: Planning Phase - Not Started
|
||||
**Owner**: Mathias Beaulieu-Duncan
|
||||
1218
bidirectional-communication-design.md
Normal file
1218
bidirectional-communication-design.md
Normal file
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user