- 6 phases covering workflow abstraction through production monitoring - Detailed task breakdowns with checkboxes for tracking - Progressive complexity: simple by default, powerful when needed - Support for ephemeral & persistent streams - Cross-service communication via RabbitMQ - Schema evolution with automatic upcasting - Exactly-once delivery and read receipts - ~10+ weeks timeline with clear success criteria
33 KiB
Event Streaming Implementation Plan
Executive Summary
Transform the CQRS framework into a complete enterprise event streaming platform that supports:
- Workflows: Business process correlation and event emission
- Multiple Consumer Patterns: Broadcast, exclusive, consumer groups, read receipts
- Storage Models: Ephemeral (message queue) and persistent (event sourcing)
- Delivery Semantics: At-most-once, at-least-once, exactly-once
- Cross-Service Communication: RabbitMQ, Kafka integration with zero developer friction
- Schema Evolution: Event versioning with automatic upcasting
- Event Replay: Time-travel queries for persistent streams
Design Philosophy: Simple by default, powerful when needed. Progressive complexity.
Architecture Layers
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: WORKFLOW (Business Process) │
│ What events belong together logically? │
│ Example: InvitationWorkflow, UserWorkflow │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: EVENT STREAM (Organization & Retention) │
│ How are events stored and organized? │
│ Example: Persistent vs Ephemeral, retention policies │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: SUBSCRIPTION (Consumer Routing) │
│ Who wants to consume what? │
│ Example: Broadcast, Exclusive, ConsumerGroup, ReadReceipt │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 4: DELIVERY (Transport Mechanism) │
│ How do events reach consumers? │
│ Example: gRPC, RabbitMQ, Kafka │
└─────────────────────────────────────────────────────────────┘
Core Enumerations
StreamType
Ephemeral: Message queue semantics (events deleted after consumption)Persistent: Event log semantics (events retained for replay)
DeliverySemantics
AtMostOnce: Fire and forget (fast, might lose messages)AtLeastOnce: Retry until ack (might see duplicates)ExactlyOnce: Deduplication (slower, no duplicates)
SubscriptionMode
Broadcast: All consumers get all events (pub/sub)Exclusive: Only one consumer gets each event (queue)ConsumerGroup: Load-balanced across group members (Kafka-style)ReadReceipt: Requires explicit "user saw this" confirmation
StreamScope
Internal: Same service only (default)CrossService: Available to external services via message broker
Phase 1: Core Workflow & Streaming Foundation
Goal: Get basic workflow + ephemeral streaming working with in-memory storage
Duration: Weeks 1-2
Phase 1 Tasks
1.1 Workflow Abstraction
- Create
Workflowabstract base classIdproperty (workflow instance identifier)IsNewproperty (started vs continued)Emit<TEvent>()protected method- Internal
PendingEventscollection
- Create
ICommandHandlerWithWorkflow<TCommand, TResult, TWorkflow>interface - Create
ICommandHandlerWithWorkflow<TCommand, TWorkflow>interface (no result) - Update sample: Convert
UserEventtoUserWorkflow : Workflow - Update sample: Convert
InvitationEventtoInvitationWorkflow : Workflow
1.2 Stream Configuration
- Create
StreamTypeenum - Create
DeliverySemanticsenum - Create
SubscriptionModeenum - Create
StreamScopeenum - Create
IStreamConfigurationinterface - Create
StreamConfigurationimplementation - Create fluent configuration API:
AddEventStreaming()
1.3 In-Memory Storage (Ephemeral)
- Create
IEventStreamStoreinterfaceEnqueueAsync()for ephemeral streamsDequeueAsync()for ephemeral streamsAcknowledgeAsync()for message acknowledgmentNackAsync()for requeue/dead-letter
- Create
InMemoryEventStreamStoreimplementation- Concurrent queues per stream
- Per-consumer visibility tracking
- Acknowledgment handling
- Create
ISubscriptionStoreinterfaceRegisterConsumerAsync()UnregisterConsumerAsync()GetConsumersAsync()
- Create
InMemorySubscriptionStoreimplementation
1.4 Subscription System
- Create
ISubscriptioninterface - Create
Subscriptionimplementation - Create
IEventSubscriptionClientfor consumers - Create
EventSubscriptionClientimplementation - Implement
Broadcastmode - Implement
Exclusivemode - Create subscription configuration API
1.5 Workflow Decorators
- Create
WorkflowContext<TWorkflow>class - Create
CommandHandlerWithWorkflowDecorator<TCommand, TResult, TWorkflow> - Create
CommandHandlerWithWorkflowDecoratorNoResult<TCommand, TWorkflow> - Update event emission to use workflow ID as correlation ID
- Integrate with existing
IEventEmitter
1.6 Service Registration
- Create
AddCommandWithWorkflow<TCommand, TResult, TWorkflow, THandler>()extension - Create
AddCommandWithWorkflow<TCommand, TResult, TWorkflow, THandler, TValidator>()extension - Create
AddCommandWithWorkflow<TCommand, TWorkflow, THandler>()extension (no result) - Deprecate
AddCommandWithEvents(keep for backward compatibility) - Update
ServiceCollectionExtensionswith workflow registration
1.7 gRPC Streaming (Basic)
- Create
IEventDeliveryProviderinterface - Create
GrpcEventDeliveryProviderimplementation - Update gRPC service to support bidirectional streaming
- Implement consumer registration/unregistration
- Handle connection lifecycle (connect/disconnect/reconnect)
1.8 Sample Project Updates
- Refactor
UserEvents.cs→UserWorkflow.cs - Refactor
InvitationWorkflow.csto use new API - Update
Program.cswith workflow registration - Add simple subscription consumer example
- Add gRPC streaming consumer example
- Update documentation
1.9 Testing & Validation
- Build and verify no regressions
- Test workflow start/continue semantics
- Test ephemeral stream (message queue behavior)
- Test broadcast subscription (multiple consumers)
- Test exclusive subscription (single consumer)
- Test gRPC streaming connection
- Verify existing features still work
Phase 1 Success Criteria:
✅ This should work:
// Registration
builder.Services.AddCommandWithWorkflow<InviteUserCommand, string, InvitationWorkflow, InviteUserCommandHandler>();
// Handler
public class InviteUserCommandHandler
: ICommandHandlerWithWorkflow<InviteUserCommand, string, InvitationWorkflow>
{
public async Task<string> HandleAsync(
InviteUserCommand command,
InvitationWorkflow workflow,
CancellationToken ct)
{
workflow.Emit(new UserInvitedEvent { ... });
return workflow.Id;
}
}
// Consumer
await foreach (var @event in client.SubscribeAsync("my-subscription", "consumer-1", ct))
{
Console.WriteLine($"Received: {@event}");
}
Phase 2: Persistence & Event Sourcing
Goal: Add persistent streams with replay capability
Duration: Weeks 3-4
Phase 2 Tasks
2.1 Storage Abstractions (Persistent)
- Extend
IEventStreamStorewith append-only log methods:AppendAsync()for persistent streamsReadStreamAsync()for reading event logGetStreamLengthAsync()for stream metadata
- Create
StoredEventrecord (offset, timestamp, event data) - Create
StreamMetadatarecord (length, retention, oldest event)
2.2 PostgreSQL Storage Implementation
- Create
PostgresEventStreamStore : IEventStreamStore - Design event log schema:
eventstable (stream_name, offset, event_type, event_data, correlation_id, timestamp)- Indexes for efficient queries
- Partition strategy for large streams
- Implement append operations with optimistic concurrency
- Implement read operations with offset-based pagination
- Implement queue operations for ephemeral streams
2.3 Offset Tracking
- Create
IConsumerOffsetStoreinterfaceGetOffsetAsync(subscriptionId, consumerId)SetOffsetAsync(subscriptionId, consumerId, offset)GetConsumerPositionsAsync(subscriptionId)(for monitoring)
- Create
PostgresConsumerOffsetStoreimplementation - Design offset tracking schema:
consumer_offsetstable (subscription_id, consumer_id, stream_offset, last_updated)
- Integrate offset tracking with subscription client
2.4 Retention Policies
- Create
RetentionPolicyconfiguration- Time-based retention (e.g., 90 days)
- Size-based retention (e.g., 10GB max)
- Count-based retention (e.g., 1M events max)
- Create
IRetentionServiceinterface - Create
RetentionServicebackground service - Implement retention policy enforcement
- Add configurable cleanup intervals
2.5 Event Replay API
- Create
IEventReplayServiceinterface - Create
EventReplayServiceimplementation - Create
ReplayOptionsconfiguration:StartPosition(Beginning, Offset, Timestamp, EventId)EndPosition(Latest, Offset, Timestamp, EventId)FilterpredicateMaxEventslimit
- Implement replay from persistent streams
- Add replay to new consumer (catch-up subscription)
2.6 Stream Configuration Extensions
- Extend stream configuration with:
Type = StreamType.PersistentRetentionpoliciesEnableReplay = true/false
- Validate configuration (ephemeral can't have replay)
- Add stream type detection and routing
2.7 Migration & Compatibility
- Create database migration scripts
- Add backward compatibility for in-memory implementation
- Allow mixing persistent and ephemeral streams
- Support runtime switching (development vs production)
2.8 Testing
- Test persistent stream append/read
- Test offset tracking across restarts
- Test retention policy enforcement
- Test event replay from various positions
- Test catch-up subscriptions
- Stress test with large event volumes
Phase 2 Success Criteria:
✅ This should work:
// Configure persistent stream
builder.Services.AddEventStreaming(streaming =>
{
streaming.AddStream<UserWorkflow>(stream =>
{
stream.Type = StreamType.Persistent;
stream.Retention = TimeSpan.FromDays(90);
stream.EnableReplay = true;
});
});
// Use PostgreSQL storage
services.AddSingleton<IEventStreamStore, PostgresEventStreamStore>();
// Replay events
var replay = await replayService.ReplayStreamAsync("user-events", new ReplayOptions
{
From = new StartPosition.Timestamp(DateTimeOffset.UtcNow.AddDays(-7))
}, ct);
await foreach (var @event in replay)
{
// Process historical events
}
Phase 3: Exactly-Once Delivery & Read Receipts
Goal: Add deduplication and explicit user confirmation
Duration: Week 5
Phase 3 Tasks
3.1 Idempotency Store
- Create
IIdempotencyStoreinterfaceWasProcessedAsync(consumerId, eventId)MarkProcessedAsync(consumerId, eventId, processedAt)TryAcquireIdempotencyLockAsync(idempotencyKey, lockDuration)ReleaseIdempotencyLockAsync(idempotencyKey)CleanupAsync(olderThan)
- Create
PostgresIdempotencyStoreimplementation - Design idempotency schema:
processed_eventstable (consumer_id, event_id, processed_at)idempotency_lockstable (lock_key, acquired_at, expires_at)
- Add TTL-based cleanup
3.2 Exactly-Once Middleware
- Create
ExactlyOnceDeliveryDecorator - Implement duplicate detection
- Implement distributed locking
- Add automatic retry on lock contention
- Integrate with subscription pipeline
3.3 Read Receipt Store
- Create
IReadReceiptStoreinterfaceMarkDeliveredAsync(subscriptionId, consumerId, eventId, deliveredAt)MarkReadAsync(subscriptionId, consumerId, eventId, readAt)GetUnreadEventsAsync(subscriptionId, consumerId)GetExpiredUnreadEventsAsync(timeout)
- Create
PostgresReadReceiptStoreimplementation - Design read receipt schema:
read_receiptstable (subscription_id, consumer_id, event_id, delivered_at, read_at, status)
3.4 Read Receipt API
- Extend
IEventSubscriptionClientwith:MarkAsReadAsync(eventId)MarkAllAsReadAsync()GetUnreadCountAsync()
- Create
ReadReceiptEventwrapper with.MarkAsReadAsync()method - Implement unread timeout handling
- Add dead letter queue for expired unread events
3.5 Configuration
- Extend stream configuration with:
DeliverySemantics = DeliverySemantics.ExactlyOnce
- Extend subscription configuration with:
Mode = SubscriptionMode.ReadReceiptOnUnreadTimeoutdurationOnUnreadExpiredpolicy (Requeue, DeadLetter, Drop)
- Add validation for configuration combinations
3.6 Monitoring & Cleanup
- Create background service for unread timeout detection
- Add metrics for unread events per consumer
- Add health checks for lagging consumers
- Implement automatic cleanup of old processed events
3.7 Testing
- Test duplicate event detection
- Test concurrent processing with locking
- Test read receipt lifecycle (delivered → read)
- Test unread timeout handling
- Test exactly-once guarantees under failure
Phase 3 Success Criteria:
✅ This should work:
// Exactly-once delivery
builder.Services.AddEventStreaming(streaming =>
{
streaming.AddStream<UserWorkflow>(stream =>
{
stream.Type = StreamType.Persistent;
stream.DeliverySemantics = DeliverySemantics.ExactlyOnce;
});
});
// Read receipts
streaming.AddSubscription("admin-notifications", subscription =>
{
subscription.ToStream<UserWorkflow>();
subscription.Mode = SubscriptionMode.ReadReceipt;
subscription.OnUnreadTimeout = TimeSpan.FromHours(24);
});
// Consumer
await foreach (var notification in client.SubscribeAsync("admin-notifications", "admin-123", ct))
{
await ShowNotificationAsync(notification);
await notification.MarkAsReadAsync(); // Explicit confirmation
}
Phase 4: Cross-Service Communication (RabbitMQ)
Goal: Enable event streaming across different services via RabbitMQ with zero developer friction
Duration: Weeks 6-7
Phase 4 Tasks
4.1 External Delivery Abstraction
- Extend
IEventDeliveryProviderwith:PublishExternalAsync(streamName, event, metadata)SubscribeExternalAsync(streamName, subscriptionId, consumerId)
- Create
ExternalDeliveryConfiguration - Add provider registration API
4.2 RabbitMQ Provider
- Create
RabbitMqEventDeliveryProvider : IEventDeliveryProvider - Create
RabbitMqConfiguration:- Connection string
- Exchange prefix
- Exchange type (topic, fanout, direct)
- Routing key strategy
- Auto-declare topology
- Implement connection management (connect, reconnect, dispose)
- Implement publish operations
- Implement subscribe operations
- Add NuGet dependency:
RabbitMQ.Client
4.3 Topology Management
- Create
IRabbitMqTopologyManagerinterface - Implement automatic exchange creation:
- Format:
{prefix}.{stream-name}(e.g.,myapp.user-events) - Type: topic exchange (default)
- Format:
- Implement automatic queue creation:
- Broadcast:
{prefix}.{subscription-id}.{consumer-id} - Exclusive:
{prefix}.{subscription-id} - ConsumerGroup:
{prefix}.{subscription-id}
- Broadcast:
- Implement automatic binding creation:
- Routing keys based on event type names
- Add validation for valid names (no spaces, special chars)
4.4 Remote Stream Configuration
- Create
IRemoteStreamConfigurationinterface - Create fluent API:
AddRemoteStream(name, config) - Implement remote stream subscription
- Add cross-service event routing
4.5 Message Serialization
- Create
IEventSerializerinterface - Create
JsonEventSerializerimplementation - Add event type metadata in message headers:
event-type(CLR type name)event-version(schema version)correlation-idtimestamp
- Implement deserialization with type resolution
4.6 Acknowledgment & Redelivery
- Implement manual acknowledgment (ack)
- Implement negative acknowledgment (nack) with requeue
- Add dead letter queue configuration
- Implement retry policies (exponential backoff)
- Add max retry count
4.7 Connection Resilience
- Implement automatic reconnection on failure
- Add connection health checks
- Implement circuit breaker pattern
- Add connection pool management
- Log connection events (connected, disconnected, reconnecting)
4.8 Cross-Service Sample
- Create second sample project:
Svrnty.Sample.Analytics - Configure Service A to publish to RabbitMQ
- Configure Service B to consume from RabbitMQ
- Demonstrate cross-service event flow
- Add docker-compose with RabbitMQ
4.9 Testing
- Test exchange/queue creation
- Test message publishing
- Test message consumption
- Test acknowledgment handling
- Test connection failure recovery
- Test dead letter queue
- Integration test across two services
Phase 4 Success Criteria:
✅ This should work:
// Service A: Publish events externally
builder.Services.AddEventStreaming(streaming =>
{
streaming.AddStream<UserWorkflow>(stream =>
{
stream.Type = StreamType.Persistent;
stream.Scope = StreamScope.CrossService;
stream.ExternalDelivery.UseRabbitMq(rabbitmq =>
{
rabbitmq.ConnectionString = "amqp://localhost";
rabbitmq.ExchangeName = "user-service.events";
});
});
});
// Service B: Consume from Service A
builder.Services.AddEventStreaming(streaming =>
{
streaming.AddRemoteStream("user-service.events", remote =>
{
remote.UseRabbitMq(rabbitmq =>
{
rabbitmq.ConnectionString = "amqp://localhost";
});
});
streaming.AddSubscription("analytics", subscription =>
{
subscription.ToRemoteStream("user-service.events");
subscription.Mode = SubscriptionMode.ConsumerGroup;
});
});
// Zero RabbitMQ knowledge needed by developer!
Phase 5: Schema Evolution & Versioning
Goal: Support event versioning with automatic upcasting
Duration: Weeks 8-9
Phase 5 Tasks
5.1 Schema Registry Abstractions
- Create
ISchemaRegistryinterfaceRegisterSchemaAsync<TEvent>(version, upcastFromType)GetSchemaAsync(eventType, version)GetSchemaHistoryAsync(eventType)UpcastAsync(event, targetVersion)
- Create
SchemaInforecord (version, CLR type, JSON schema, upcast info) - Create
ISchemaStoreinterface for persistence
5.2 Event Versioning Attributes
- Create
[EventVersion(int)]attribute - Create
[EventVersionAttribute]with:VersionpropertyUpcastFromtype property
- Add compile-time validation (via analyzer if time permits)
5.3 Schema Registry Implementation
- Create
SchemaRegistry : ISchemaRegistry - Create
PostgresSchemaStore : ISchemaStore - Design schema storage:
event_schemastable (event_type, version, clr_type, json_schema, upcast_from_type, registered_at)
- Implement version registration
- Implement schema lookup with caching
5.4 Upcasting Pipeline
- Create
IEventUpcaster<TFrom, TTo>interface - Create
EventUpcastingMiddleware - Implement automatic upcaster discovery:
- Via static method:
TTo.UpcastFrom(TFrom) - Via registered
IEventUpcaster<TFrom, TTo>implementations
- Via static method:
- Implement multi-hop upcasting (V1 → V2 → V3)
- Add upcasting to subscription pipeline
5.5 JSON Schema Generation
- Create
IJsonSchemaGeneratorinterface - Create
JsonSchemaGeneratorimplementation - Generate JSON Schema from CLR types
- Store schemas in registry for external consumers
- Add schema validation (optional)
5.6 Configuration
- Extend stream configuration with:
EnableSchemaEvolution = true/falseSchemaRegistryconfiguration
- Add fluent API for schema registration:
registry.Register<TEvent>(version)registry.Register<TEvent>(version, upcastFrom: typeof(TOldEvent))
- Extend subscription configuration:
ReceiveAs<TEventVersion>()to specify target version
5.7 Backward Compatibility
- Handle events without version attribute (default to version 1)
- Support mixed versioned/unversioned events
- Add migration path for existing events
5.8 Testing
- Test version registration
- Test single-hop upcasting (V1 → V2)
- Test multi-hop upcasting (V1 → V2 → V3)
- Test new consumers receiving old events (auto-upcast)
- Test schema storage and retrieval
- Test JSON schema generation
Phase 5 Success Criteria:
✅ This should work:
// Event V1
[EventVersion(1)]
public sealed record UserAddedEventV1 : UserWorkflow
{
public required int UserId { get; init; }
public required string Name { get; init; }
}
// Event V2 with upcaster
[EventVersion(2, UpcastFrom = typeof(UserAddedEventV1))]
public sealed record UserAddedEventV2 : UserWorkflow
{
public required int UserId { get; init; }
public required string FirstName { get; init; }
public required string LastName { get; init; }
public required string Email { get; init; }
public static UserAddedEventV2 UpcastFrom(UserAddedEventV1 v1)
{
var names = v1.Name.Split(' ', 2);
return new UserAddedEventV2
{
UserId = v1.UserId,
FirstName = names[0],
LastName = names.Length > 1 ? names[1] : "",
Email = $"user{v1.UserId}@unknown.com"
};
}
}
// Configuration
streaming.UseSchemaRegistry(registry =>
{
registry.Register<UserAddedEventV1>(version: 1);
registry.Register<UserAddedEventV2>(version: 2, upcastFrom: typeof(UserAddedEventV1));
});
// Consumer always receives V2 (framework auto-upcasts V1 → V2)
streaming.AddSubscription("analytics", subscription =>
{
subscription.ToStream<UserWorkflow>();
subscription.ReceiveAs<UserAddedEventV2>();
});
Phase 6: Management, Monitoring & Observability
Goal: Production-ready monitoring, health checks, and management APIs
Duration: Week 10+
Phase 6 Tasks
6.1 Health Checks
- Create
IStreamHealthCheckinterface - Implement stream health checks:
- Stream exists and is writable
- Consumer lag detection (offset vs stream length)
- Stalled consumer detection (no progress for N minutes)
- Integrate with ASP.NET Core health checks
- Add health check endpoints
6.2 Metrics & Telemetry
- Define key metrics:
- Events published per stream (rate)
- Events consumed per subscription (rate)
- Consumer lag (offset delta)
- Processing latency (time from publish to ack)
- Error rate
- Integrate with OpenTelemetry
- Add Prometheus endpoint
- Create Grafana dashboard templates
6.3 Management API
- Create REST API for management:
GET /api/streams- List all streamsGET /api/streams/{name}- Get stream detailsGET /api/streams/{name}/subscriptions- List subscriptionsGET /api/subscriptions/{id}/consumers- List consumersGET /api/subscriptions/{id}/consumers/{consumerId}/offset- Get consumer positionPOST /api/subscriptions/{id}/consumers/{consumerId}/reset-offset- Reset offsetDELETE /api/subscriptions/{id}/consumers/{consumerId}- Remove consumer
- Add authorization (admin only)
- Add Swagger documentation
6.4 Admin Dashboard (Optional)
- Create simple web UI for monitoring:
- Stream list with event counts
- Subscription list with consumer status
- Consumer lag visualization
- Event replay interface
- Use Blazor or simple HTML/JS
6.5 Logging
- Add structured logging with Serilog/NLog
- Log key events:
- Stream created
- Consumer registered/unregistered
- Event published
- Event consumed
- Errors and retries
- Add correlation IDs to all logs
- Add log levels (Debug, Info, Warning, Error)
6.6 Alerting (Optional)
- Define alerting rules:
- Consumer lag exceeds threshold
- Consumer stalled (no progress)
- Error rate spike
- Dead letter queue growth
- Integration with alerting systems (email, Slack, PagerDuty)
6.7 Documentation
- Update CLAUDE.md with event streaming documentation
- Create developer guide
- Create deployment guide
- Create troubleshooting guide
- Add API reference documentation
- Create architecture diagrams
6.8 Testing
- Test health check endpoints
- Test metrics collection
- Test management API
- Load testing (throughput, latency)
- Chaos testing (failure scenarios)
Phase 6 Success Criteria:
✅ Production-ready features:
// Health checks
builder.Services.AddHealthChecks()
.AddEventStreamHealthCheck();
// Metrics exposed at /metrics
builder.Services.AddEventStreaming(streaming =>
{
streaming.EnableMetrics();
streaming.EnableHealthChecks();
});
// Management API available
// GET /api/streams → List all streams
// GET /api/streams/user-events/subscriptions → View subscriptions
// POST /api/subscriptions/admin-notifications/consumers/admin-123/reset-offset → Reset lag
Optional Future Phases
Phase 7: Advanced Features (Post-Launch)
- Kafka provider implementation
- Azure Service Bus provider
- AWS SQS/SNS provider
- Saga orchestration support
- Event sourcing projections
- Snapshot support for aggregates
- CQRS read model synchronization
- GraphQL subscriptions integration
- SignalR integration for browser clients
Phase 8: Performance Optimizations
- Batch processing support
- Stream partitioning
- Parallel consumer processing
- Event compression
- Connection pooling
- Query optimization
Design Decisions & Rationale
Why Workflows Over Events?
Decision: Make workflows the primary abstraction, not events.
Rationale:
- Workflows represent business processes (how developers think)
- Events are implementation details of workflows
- Clearer intent: "This command participates in an invitation workflow"
- Solves correlation problem elegantly (workflow ID = correlation ID)
Why Support Both Ephemeral & Persistent?
Decision: Support both message queue (ephemeral) and event sourcing (persistent) patterns.
Rationale:
- Different use cases have different needs
- Ephemeral: Simple notifications, no need for history
- Persistent: Audit logs, analytics, replay capability
- Developer chooses based on requirements
- Same API for both (progressive complexity)
Why Exactly-Once Opt-In?
Decision: Make exactly-once delivery optional, default to at-least-once.
Rationale:
- Exactly-once has performance cost (deduplication, locking)
- Most scenarios can handle duplicates (idempotent handlers)
- Developer opts in when critical (financial transactions)
- Simpler default behavior
Why Cross-Service Opt-In?
Decision: Streams are internal by default, external requires explicit configuration.
Rationale:
- Security: Don't expose events externally by accident
- Performance: Internal delivery (gRPC) is faster
- Simplicity: Most services don't need cross-service events
- Developer explicitly chooses when needed
Why Schema Evolution?
Decision: Support event versioning from the start.
Rationale:
- Events are long-lived (years in persistent streams)
- Schema changes are inevitable
- Breaking changes hurt (can't deserialize old events)
- Automatic upcasting prevents data loss
- Essential for persistent streams with replay
Success Metrics
Phase 1
- ✅ Basic workflow registration works
- ✅ Ephemeral streams work (in-memory)
- ✅ Broadcast and exclusive subscriptions work
- ✅ gRPC streaming works
- ✅ Zero breaking changes to existing features
Phase 2
- ✅ Persistent streams work (PostgreSQL)
- ✅ Event replay works from any position
- ✅ Retention policies enforced
- ✅ Consumers can resume from last offset
Phase 3
- ✅ Exactly-once delivery works (no duplicates)
- ✅ Read receipts work (delivered vs read)
- ✅ Unread timeout handling works
Phase 4
- ✅ Events flow from Service A to Service B via RabbitMQ
- ✅ Zero RabbitMQ code in handlers
- ✅ Automatic topology creation works
- ✅ Connection resilience works
Phase 5
- ✅ Old events automatically upcast to new version
- ✅ New consumers receive latest version
- ✅ Multi-hop upcasting works (V1→V2→V3)
Phase 6
- ✅ Health checks detect lagging consumers
- ✅ Metrics exposed for monitoring
- ✅ Management API works
- ✅ Documentation complete
Risk Mitigation
Risk: Breaking Existing Features
Mitigation:
- Keep
AddCommandWithEventsfor backward compatibility - Run full test suite after each phase
- Feature flags for new functionality
Risk: Performance Issues
Mitigation:
- Start with in-memory (fast)
- Benchmark at each phase
- Add performance tests before Phase 6
- Use profiling tools
Risk: Complexity Overload
Mitigation:
- Progressive disclosure (simple by default)
- Each phase is independently useful
- Clear documentation at each level
- Sample projects for each complexity level
Risk: Database Schema Changes
Mitigation:
- Use migrations from Phase 2 onward
- Backward-compatible schema changes
- Test migration paths
Risk: External Dependencies (RabbitMQ, etc.)
Mitigation:
- Make external delivery optional
- Provide in-memory fallback
- Docker Compose for development
- Clear setup documentation
Development Guidelines
Coding Standards
- Use C# 14 features (field keyword, extension members)
- Follow existing patterns in codebase
- XML documentation on public APIs
- Async/await throughout
- CancellationToken support on all async methods
Testing Strategy
- Unit tests for core logic
- Integration tests for storage implementations
- End-to-end tests for full scenarios
- Performance benchmarks for critical paths
Documentation Requirements
- XML doc comments on all public APIs
- README updates for each phase
- Sample code for new features
- Architecture diagrams
Code Review Checklist
- Follows existing code style
- Has XML documentation
- Has unit tests
- No breaking changes (or documented)
- Performance acceptable
- Error handling complete
Timeline Summary
| Phase | Duration | Key Deliverable |
|---|---|---|
| Phase 1 | 2 weeks | Basic workflows + ephemeral streaming |
| Phase 2 | 2 weeks | Persistent streams + replay |
| Phase 3 | 1 week | Exactly-once + read receipts |
| Phase 4 | 2 weeks | RabbitMQ cross-service |
| Phase 5 | 2 weeks | Schema evolution |
| Phase 6 | 1+ week | Management & monitoring |
| Total | 10+ weeks | Production-ready event streaming platform |
Next Steps
- Review this plan - Validate approach and priorities
- Create feature branch -
feature/event-streaming - Start Phase 1.1 - Workflow abstraction
- Iterate rapidly - Small commits, frequent builds
- Update this document - Check off tasks as completed
Notes & Questions
- Decision: PostgreSQL or pluggable storage from Phase 2?
- Decision: gRPC-only or add SignalR for browser support?
- Decision: Create separate NuGet packages per phase or monolithic?
- Question: Should we support Kafka in Phase 4 or separate phase?
- Question: Do we need distributed tracing (OpenTelemetry) integration?
Last Updated: 2025-12-09 Status: Planning Phase - Not Started Owner: Mathias Beaulieu-Duncan