dotnet-cqrs/EVENT-STREAMING-IMPLEMENTATION-PLAN.md
Mathias Beaulieu-Duncan 668f6be0d6 Add comprehensive event streaming implementation plan
- 6 phases covering workflow abstraction through production monitoring
- Detailed task breakdowns with checkboxes for tracking
- Progressive complexity: simple by default, powerful when needed
- Support for ephemeral & persistent streams
- Cross-service communication via RabbitMQ
- Schema evolution with automatic upcasting
- Exactly-once delivery and read receipts
- ~10+ weeks timeline with clear success criteria
2025-12-09 14:57:10 -05:00

33 KiB

Event Streaming Implementation Plan

Executive Summary

Transform the CQRS framework into a complete enterprise event streaming platform that supports:

  • Workflows: Business process correlation and event emission
  • Multiple Consumer Patterns: Broadcast, exclusive, consumer groups, read receipts
  • Storage Models: Ephemeral (message queue) and persistent (event sourcing)
  • Delivery Semantics: At-most-once, at-least-once, exactly-once
  • Cross-Service Communication: RabbitMQ, Kafka integration with zero developer friction
  • Schema Evolution: Event versioning with automatic upcasting
  • Event Replay: Time-travel queries for persistent streams

Design Philosophy: Simple by default, powerful when needed. Progressive complexity.


Architecture Layers

┌─────────────────────────────────────────────────────────────┐
│  Layer 1: WORKFLOW (Business Process)                       │
│  What events belong together logically?                     │
│  Example: InvitationWorkflow, UserWorkflow                  │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│  Layer 2: EVENT STREAM (Organization & Retention)           │
│  How are events stored and organized?                       │
│  Example: Persistent vs Ephemeral, retention policies       │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│  Layer 3: SUBSCRIPTION (Consumer Routing)                   │
│  Who wants to consume what?                                 │
│  Example: Broadcast, Exclusive, ConsumerGroup, ReadReceipt  │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│  Layer 4: DELIVERY (Transport Mechanism)                    │
│  How do events reach consumers?                             │
│  Example: gRPC, RabbitMQ, Kafka                             │
└─────────────────────────────────────────────────────────────┘

Core Enumerations

StreamType

  • Ephemeral: Message queue semantics (events deleted after consumption)
  • Persistent: Event log semantics (events retained for replay)

DeliverySemantics

  • AtMostOnce: Fire and forget (fast, might lose messages)
  • AtLeastOnce: Retry until ack (might see duplicates)
  • ExactlyOnce: Deduplication (slower, no duplicates)

SubscriptionMode

  • Broadcast: All consumers get all events (pub/sub)
  • Exclusive: Only one consumer gets each event (queue)
  • ConsumerGroup: Load-balanced across group members (Kafka-style)
  • ReadReceipt: Requires explicit "user saw this" confirmation

StreamScope

  • Internal: Same service only (default)
  • CrossService: Available to external services via message broker

Phase 1: Core Workflow & Streaming Foundation

Goal: Get basic workflow + ephemeral streaming working with in-memory storage

Duration: Weeks 1-2

Phase 1 Tasks

1.1 Workflow Abstraction

  • Create Workflow abstract base class
    • Id property (workflow instance identifier)
    • IsNew property (started vs continued)
    • Emit<TEvent>() protected method
    • Internal PendingEvents collection
  • Create ICommandHandlerWithWorkflow<TCommand, TResult, TWorkflow> interface
  • Create ICommandHandlerWithWorkflow<TCommand, TWorkflow> interface (no result)
  • Update sample: Convert UserEvent to UserWorkflow : Workflow
  • Update sample: Convert InvitationEvent to InvitationWorkflow : Workflow

1.2 Stream Configuration

  • Create StreamType enum
  • Create DeliverySemantics enum
  • Create SubscriptionMode enum
  • Create StreamScope enum
  • Create IStreamConfiguration interface
  • Create StreamConfiguration implementation
  • Create fluent configuration API: AddEventStreaming()

1.3 In-Memory Storage (Ephemeral)

  • Create IEventStreamStore interface
    • EnqueueAsync() for ephemeral streams
    • DequeueAsync() for ephemeral streams
    • AcknowledgeAsync() for message acknowledgment
    • NackAsync() for requeue/dead-letter
  • Create InMemoryEventStreamStore implementation
    • Concurrent queues per stream
    • Per-consumer visibility tracking
    • Acknowledgment handling
  • Create ISubscriptionStore interface
    • RegisterConsumerAsync()
    • UnregisterConsumerAsync()
    • GetConsumersAsync()
  • Create InMemorySubscriptionStore implementation

1.4 Subscription System

  • Create ISubscription interface
  • Create Subscription implementation
  • Create IEventSubscriptionClient for consumers
  • Create EventSubscriptionClient implementation
  • Implement Broadcast mode
  • Implement Exclusive mode
  • Create subscription configuration API

1.5 Workflow Decorators

  • Create WorkflowContext<TWorkflow> class
  • Create CommandHandlerWithWorkflowDecorator<TCommand, TResult, TWorkflow>
  • Create CommandHandlerWithWorkflowDecoratorNoResult<TCommand, TWorkflow>
  • Update event emission to use workflow ID as correlation ID
  • Integrate with existing IEventEmitter

1.6 Service Registration

  • Create AddCommandWithWorkflow<TCommand, TResult, TWorkflow, THandler>() extension
  • Create AddCommandWithWorkflow<TCommand, TResult, TWorkflow, THandler, TValidator>() extension
  • Create AddCommandWithWorkflow<TCommand, TWorkflow, THandler>() extension (no result)
  • Deprecate AddCommandWithEvents (keep for backward compatibility)
  • Update ServiceCollectionExtensions with workflow registration

1.7 gRPC Streaming (Basic)

  • Create IEventDeliveryProvider interface
  • Create GrpcEventDeliveryProvider implementation
  • Update gRPC service to support bidirectional streaming
  • Implement consumer registration/unregistration
  • Handle connection lifecycle (connect/disconnect/reconnect)

1.8 Sample Project Updates

  • Refactor UserEvents.csUserWorkflow.cs
  • Refactor InvitationWorkflow.cs to use new API
  • Update Program.cs with workflow registration
  • Add simple subscription consumer example
  • Add gRPC streaming consumer example
  • Update documentation

1.9 Testing & Validation

  • Build and verify no regressions
  • Test workflow start/continue semantics
  • Test ephemeral stream (message queue behavior)
  • Test broadcast subscription (multiple consumers)
  • Test exclusive subscription (single consumer)
  • Test gRPC streaming connection
  • Verify existing features still work

Phase 1 Success Criteria:

 This should work:

// Registration
builder.Services.AddCommandWithWorkflow<InviteUserCommand, string, InvitationWorkflow, InviteUserCommandHandler>();

// Handler
public class InviteUserCommandHandler
    : ICommandHandlerWithWorkflow<InviteUserCommand, string, InvitationWorkflow>
{
    public async Task<string> HandleAsync(
        InviteUserCommand command,
        InvitationWorkflow workflow,
        CancellationToken ct)
    {
        workflow.Emit(new UserInvitedEvent { ... });
        return workflow.Id;
    }
}

// Consumer
await foreach (var @event in client.SubscribeAsync("my-subscription", "consumer-1", ct))
{
    Console.WriteLine($"Received: {@event}");
}

Phase 2: Persistence & Event Sourcing

Goal: Add persistent streams with replay capability

Duration: Weeks 3-4

Phase 2 Tasks

2.1 Storage Abstractions (Persistent)

  • Extend IEventStreamStore with append-only log methods:
    • AppendAsync() for persistent streams
    • ReadStreamAsync() for reading event log
    • GetStreamLengthAsync() for stream metadata
  • Create StoredEvent record (offset, timestamp, event data)
  • Create StreamMetadata record (length, retention, oldest event)

2.2 PostgreSQL Storage Implementation

  • Create PostgresEventStreamStore : IEventStreamStore
  • Design event log schema:
    • events table (stream_name, offset, event_type, event_data, correlation_id, timestamp)
    • Indexes for efficient queries
    • Partition strategy for large streams
  • Implement append operations with optimistic concurrency
  • Implement read operations with offset-based pagination
  • Implement queue operations for ephemeral streams

2.3 Offset Tracking

  • Create IConsumerOffsetStore interface
    • GetOffsetAsync(subscriptionId, consumerId)
    • SetOffsetAsync(subscriptionId, consumerId, offset)
    • GetConsumerPositionsAsync(subscriptionId) (for monitoring)
  • Create PostgresConsumerOffsetStore implementation
  • Design offset tracking schema:
    • consumer_offsets table (subscription_id, consumer_id, stream_offset, last_updated)
  • Integrate offset tracking with subscription client

2.4 Retention Policies

  • Create RetentionPolicy configuration
    • Time-based retention (e.g., 90 days)
    • Size-based retention (e.g., 10GB max)
    • Count-based retention (e.g., 1M events max)
  • Create IRetentionService interface
  • Create RetentionService background service
  • Implement retention policy enforcement
  • Add configurable cleanup intervals

2.5 Event Replay API

  • Create IEventReplayService interface
  • Create EventReplayService implementation
  • Create ReplayOptions configuration:
    • StartPosition (Beginning, Offset, Timestamp, EventId)
    • EndPosition (Latest, Offset, Timestamp, EventId)
    • Filter predicate
    • MaxEvents limit
  • Implement replay from persistent streams
  • Add replay to new consumer (catch-up subscription)

2.6 Stream Configuration Extensions

  • Extend stream configuration with:
    • Type = StreamType.Persistent
    • Retention policies
    • EnableReplay = true/false
  • Validate configuration (ephemeral can't have replay)
  • Add stream type detection and routing

2.7 Migration & Compatibility

  • Create database migration scripts
  • Add backward compatibility for in-memory implementation
  • Allow mixing persistent and ephemeral streams
  • Support runtime switching (development vs production)

2.8 Testing

  • Test persistent stream append/read
  • Test offset tracking across restarts
  • Test retention policy enforcement
  • Test event replay from various positions
  • Test catch-up subscriptions
  • Stress test with large event volumes

Phase 2 Success Criteria:

 This should work:

// Configure persistent stream
builder.Services.AddEventStreaming(streaming =>
{
    streaming.AddStream<UserWorkflow>(stream =>
    {
        stream.Type = StreamType.Persistent;
        stream.Retention = TimeSpan.FromDays(90);
        stream.EnableReplay = true;
    });
});

// Use PostgreSQL storage
services.AddSingleton<IEventStreamStore, PostgresEventStreamStore>();

// Replay events
var replay = await replayService.ReplayStreamAsync("user-events", new ReplayOptions
{
    From = new StartPosition.Timestamp(DateTimeOffset.UtcNow.AddDays(-7))
}, ct);

await foreach (var @event in replay)
{
    // Process historical events
}

Phase 3: Exactly-Once Delivery & Read Receipts

Goal: Add deduplication and explicit user confirmation

Duration: Week 5

Phase 3 Tasks

3.1 Idempotency Store

  • Create IIdempotencyStore interface
    • WasProcessedAsync(consumerId, eventId)
    • MarkProcessedAsync(consumerId, eventId, processedAt)
    • TryAcquireIdempotencyLockAsync(idempotencyKey, lockDuration)
    • ReleaseIdempotencyLockAsync(idempotencyKey)
    • CleanupAsync(olderThan)
  • Create PostgresIdempotencyStore implementation
  • Design idempotency schema:
    • processed_events table (consumer_id, event_id, processed_at)
    • idempotency_locks table (lock_key, acquired_at, expires_at)
  • Add TTL-based cleanup

3.2 Exactly-Once Middleware

  • Create ExactlyOnceDeliveryDecorator
  • Implement duplicate detection
  • Implement distributed locking
  • Add automatic retry on lock contention
  • Integrate with subscription pipeline

3.3 Read Receipt Store

  • Create IReadReceiptStore interface
    • MarkDeliveredAsync(subscriptionId, consumerId, eventId, deliveredAt)
    • MarkReadAsync(subscriptionId, consumerId, eventId, readAt)
    • GetUnreadEventsAsync(subscriptionId, consumerId)
    • GetExpiredUnreadEventsAsync(timeout)
  • Create PostgresReadReceiptStore implementation
  • Design read receipt schema:
    • read_receipts table (subscription_id, consumer_id, event_id, delivered_at, read_at, status)

3.4 Read Receipt API

  • Extend IEventSubscriptionClient with:
    • MarkAsReadAsync(eventId)
    • MarkAllAsReadAsync()
    • GetUnreadCountAsync()
  • Create ReadReceiptEvent wrapper with .MarkAsReadAsync() method
  • Implement unread timeout handling
  • Add dead letter queue for expired unread events

3.5 Configuration

  • Extend stream configuration with:
    • DeliverySemantics = DeliverySemantics.ExactlyOnce
  • Extend subscription configuration with:
    • Mode = SubscriptionMode.ReadReceipt
    • OnUnreadTimeout duration
    • OnUnreadExpired policy (Requeue, DeadLetter, Drop)
  • Add validation for configuration combinations

3.6 Monitoring & Cleanup

  • Create background service for unread timeout detection
  • Add metrics for unread events per consumer
  • Add health checks for lagging consumers
  • Implement automatic cleanup of old processed events

3.7 Testing

  • Test duplicate event detection
  • Test concurrent processing with locking
  • Test read receipt lifecycle (delivered → read)
  • Test unread timeout handling
  • Test exactly-once guarantees under failure

Phase 3 Success Criteria:

 This should work:

// Exactly-once delivery
builder.Services.AddEventStreaming(streaming =>
{
    streaming.AddStream<UserWorkflow>(stream =>
    {
        stream.Type = StreamType.Persistent;
        stream.DeliverySemantics = DeliverySemantics.ExactlyOnce;
    });
});

// Read receipts
streaming.AddSubscription("admin-notifications", subscription =>
{
    subscription.ToStream<UserWorkflow>();
    subscription.Mode = SubscriptionMode.ReadReceipt;
    subscription.OnUnreadTimeout = TimeSpan.FromHours(24);
});

// Consumer
await foreach (var notification in client.SubscribeAsync("admin-notifications", "admin-123", ct))
{
    await ShowNotificationAsync(notification);
    await notification.MarkAsReadAsync();  // Explicit confirmation
}

Phase 4: Cross-Service Communication (RabbitMQ)

Goal: Enable event streaming across different services via RabbitMQ with zero developer friction

Duration: Weeks 6-7

Phase 4 Tasks

4.1 External Delivery Abstraction

  • Extend IEventDeliveryProvider with:
    • PublishExternalAsync(streamName, event, metadata)
    • SubscribeExternalAsync(streamName, subscriptionId, consumerId)
  • Create ExternalDeliveryConfiguration
  • Add provider registration API

4.2 RabbitMQ Provider

  • Create RabbitMqEventDeliveryProvider : IEventDeliveryProvider
  • Create RabbitMqConfiguration:
    • Connection string
    • Exchange prefix
    • Exchange type (topic, fanout, direct)
    • Routing key strategy
    • Auto-declare topology
  • Implement connection management (connect, reconnect, dispose)
  • Implement publish operations
  • Implement subscribe operations
  • Add NuGet dependency: RabbitMQ.Client

4.3 Topology Management

  • Create IRabbitMqTopologyManager interface
  • Implement automatic exchange creation:
    • Format: {prefix}.{stream-name} (e.g., myapp.user-events)
    • Type: topic exchange (default)
  • Implement automatic queue creation:
    • Broadcast: {prefix}.{subscription-id}.{consumer-id}
    • Exclusive: {prefix}.{subscription-id}
    • ConsumerGroup: {prefix}.{subscription-id}
  • Implement automatic binding creation:
    • Routing keys based on event type names
  • Add validation for valid names (no spaces, special chars)

4.4 Remote Stream Configuration

  • Create IRemoteStreamConfiguration interface
  • Create fluent API: AddRemoteStream(name, config)
  • Implement remote stream subscription
  • Add cross-service event routing

4.5 Message Serialization

  • Create IEventSerializer interface
  • Create JsonEventSerializer implementation
  • Add event type metadata in message headers:
    • event-type (CLR type name)
    • event-version (schema version)
    • correlation-id
    • timestamp
  • Implement deserialization with type resolution

4.6 Acknowledgment & Redelivery

  • Implement manual acknowledgment (ack)
  • Implement negative acknowledgment (nack) with requeue
  • Add dead letter queue configuration
  • Implement retry policies (exponential backoff)
  • Add max retry count

4.7 Connection Resilience

  • Implement automatic reconnection on failure
  • Add connection health checks
  • Implement circuit breaker pattern
  • Add connection pool management
  • Log connection events (connected, disconnected, reconnecting)

4.8 Cross-Service Sample

  • Create second sample project: Svrnty.Sample.Analytics
  • Configure Service A to publish to RabbitMQ
  • Configure Service B to consume from RabbitMQ
  • Demonstrate cross-service event flow
  • Add docker-compose with RabbitMQ

4.9 Testing

  • Test exchange/queue creation
  • Test message publishing
  • Test message consumption
  • Test acknowledgment handling
  • Test connection failure recovery
  • Test dead letter queue
  • Integration test across two services

Phase 4 Success Criteria:

 This should work:

// Service A: Publish events externally
builder.Services.AddEventStreaming(streaming =>
{
    streaming.AddStream<UserWorkflow>(stream =>
    {
        stream.Type = StreamType.Persistent;
        stream.Scope = StreamScope.CrossService;
        stream.ExternalDelivery.UseRabbitMq(rabbitmq =>
        {
            rabbitmq.ConnectionString = "amqp://localhost";
            rabbitmq.ExchangeName = "user-service.events";
        });
    });
});

// Service B: Consume from Service A
builder.Services.AddEventStreaming(streaming =>
{
    streaming.AddRemoteStream("user-service.events", remote =>
    {
        remote.UseRabbitMq(rabbitmq =>
        {
            rabbitmq.ConnectionString = "amqp://localhost";
        });
    });

    streaming.AddSubscription("analytics", subscription =>
    {
        subscription.ToRemoteStream("user-service.events");
        subscription.Mode = SubscriptionMode.ConsumerGroup;
    });
});

// Zero RabbitMQ knowledge needed by developer!

Phase 5: Schema Evolution & Versioning

Goal: Support event versioning with automatic upcasting

Duration: Weeks 8-9

Phase 5 Tasks

5.1 Schema Registry Abstractions

  • Create ISchemaRegistry interface
    • RegisterSchemaAsync<TEvent>(version, upcastFromType)
    • GetSchemaAsync(eventType, version)
    • GetSchemaHistoryAsync(eventType)
    • UpcastAsync(event, targetVersion)
  • Create SchemaInfo record (version, CLR type, JSON schema, upcast info)
  • Create ISchemaStore interface for persistence

5.2 Event Versioning Attributes

  • Create [EventVersion(int)] attribute
  • Create [EventVersionAttribute] with:
    • Version property
    • UpcastFrom type property
  • Add compile-time validation (via analyzer if time permits)

5.3 Schema Registry Implementation

  • Create SchemaRegistry : ISchemaRegistry
  • Create PostgresSchemaStore : ISchemaStore
  • Design schema storage:
    • event_schemas table (event_type, version, clr_type, json_schema, upcast_from_type, registered_at)
  • Implement version registration
  • Implement schema lookup with caching

5.4 Upcasting Pipeline

  • Create IEventUpcaster<TFrom, TTo> interface
  • Create EventUpcastingMiddleware
  • Implement automatic upcaster discovery:
    • Via static method: TTo.UpcastFrom(TFrom)
    • Via registered IEventUpcaster<TFrom, TTo> implementations
  • Implement multi-hop upcasting (V1 → V2 → V3)
  • Add upcasting to subscription pipeline

5.5 JSON Schema Generation

  • Create IJsonSchemaGenerator interface
  • Create JsonSchemaGenerator implementation
  • Generate JSON Schema from CLR types
  • Store schemas in registry for external consumers
  • Add schema validation (optional)

5.6 Configuration

  • Extend stream configuration with:
    • EnableSchemaEvolution = true/false
    • SchemaRegistry configuration
  • Add fluent API for schema registration:
    • registry.Register<TEvent>(version)
    • registry.Register<TEvent>(version, upcastFrom: typeof(TOldEvent))
  • Extend subscription configuration:
    • ReceiveAs<TEventVersion>() to specify target version

5.7 Backward Compatibility

  • Handle events without version attribute (default to version 1)
  • Support mixed versioned/unversioned events
  • Add migration path for existing events

5.8 Testing

  • Test version registration
  • Test single-hop upcasting (V1 → V2)
  • Test multi-hop upcasting (V1 → V2 → V3)
  • Test new consumers receiving old events (auto-upcast)
  • Test schema storage and retrieval
  • Test JSON schema generation

Phase 5 Success Criteria:

 This should work:

// Event V1
[EventVersion(1)]
public sealed record UserAddedEventV1 : UserWorkflow
{
    public required int UserId { get; init; }
    public required string Name { get; init; }
}

// Event V2 with upcaster
[EventVersion(2, UpcastFrom = typeof(UserAddedEventV1))]
public sealed record UserAddedEventV2 : UserWorkflow
{
    public required int UserId { get; init; }
    public required string FirstName { get; init; }
    public required string LastName { get; init; }
    public required string Email { get; init; }

    public static UserAddedEventV2 UpcastFrom(UserAddedEventV1 v1)
    {
        var names = v1.Name.Split(' ', 2);
        return new UserAddedEventV2
        {
            UserId = v1.UserId,
            FirstName = names[0],
            LastName = names.Length > 1 ? names[1] : "",
            Email = $"user{v1.UserId}@unknown.com"
        };
    }
}

// Configuration
streaming.UseSchemaRegistry(registry =>
{
    registry.Register<UserAddedEventV1>(version: 1);
    registry.Register<UserAddedEventV2>(version: 2, upcastFrom: typeof(UserAddedEventV1));
});

// Consumer always receives V2 (framework auto-upcasts V1 → V2)
streaming.AddSubscription("analytics", subscription =>
{
    subscription.ToStream<UserWorkflow>();
    subscription.ReceiveAs<UserAddedEventV2>();
});

Phase 6: Management, Monitoring & Observability

Goal: Production-ready monitoring, health checks, and management APIs

Duration: Week 10+

Phase 6 Tasks

6.1 Health Checks

  • Create IStreamHealthCheck interface
  • Implement stream health checks:
    • Stream exists and is writable
    • Consumer lag detection (offset vs stream length)
    • Stalled consumer detection (no progress for N minutes)
  • Integrate with ASP.NET Core health checks
  • Add health check endpoints

6.2 Metrics & Telemetry

  • Define key metrics:
    • Events published per stream (rate)
    • Events consumed per subscription (rate)
    • Consumer lag (offset delta)
    • Processing latency (time from publish to ack)
    • Error rate
  • Integrate with OpenTelemetry
  • Add Prometheus endpoint
  • Create Grafana dashboard templates

6.3 Management API

  • Create REST API for management:
    • GET /api/streams - List all streams
    • GET /api/streams/{name} - Get stream details
    • GET /api/streams/{name}/subscriptions - List subscriptions
    • GET /api/subscriptions/{id}/consumers - List consumers
    • GET /api/subscriptions/{id}/consumers/{consumerId}/offset - Get consumer position
    • POST /api/subscriptions/{id}/consumers/{consumerId}/reset-offset - Reset offset
    • DELETE /api/subscriptions/{id}/consumers/{consumerId} - Remove consumer
  • Add authorization (admin only)
  • Add Swagger documentation

6.4 Admin Dashboard (Optional)

  • Create simple web UI for monitoring:
    • Stream list with event counts
    • Subscription list with consumer status
    • Consumer lag visualization
    • Event replay interface
  • Use Blazor or simple HTML/JS

6.5 Logging

  • Add structured logging with Serilog/NLog
  • Log key events:
    • Stream created
    • Consumer registered/unregistered
    • Event published
    • Event consumed
    • Errors and retries
  • Add correlation IDs to all logs
  • Add log levels (Debug, Info, Warning, Error)

6.6 Alerting (Optional)

  • Define alerting rules:
    • Consumer lag exceeds threshold
    • Consumer stalled (no progress)
    • Error rate spike
    • Dead letter queue growth
  • Integration with alerting systems (email, Slack, PagerDuty)

6.7 Documentation

  • Update CLAUDE.md with event streaming documentation
  • Create developer guide
  • Create deployment guide
  • Create troubleshooting guide
  • Add API reference documentation
  • Create architecture diagrams

6.8 Testing

  • Test health check endpoints
  • Test metrics collection
  • Test management API
  • Load testing (throughput, latency)
  • Chaos testing (failure scenarios)

Phase 6 Success Criteria:

 Production-ready features:

// Health checks
builder.Services.AddHealthChecks()
    .AddEventStreamHealthCheck();

// Metrics exposed at /metrics
builder.Services.AddEventStreaming(streaming =>
{
    streaming.EnableMetrics();
    streaming.EnableHealthChecks();
});

// Management API available
// GET /api/streams → List all streams
// GET /api/streams/user-events/subscriptions → View subscriptions
// POST /api/subscriptions/admin-notifications/consumers/admin-123/reset-offset → Reset lag

Optional Future Phases

Phase 7: Advanced Features (Post-Launch)

  • Kafka provider implementation
  • Azure Service Bus provider
  • AWS SQS/SNS provider
  • Saga orchestration support
  • Event sourcing projections
  • Snapshot support for aggregates
  • CQRS read model synchronization
  • GraphQL subscriptions integration
  • SignalR integration for browser clients

Phase 8: Performance Optimizations

  • Batch processing support
  • Stream partitioning
  • Parallel consumer processing
  • Event compression
  • Connection pooling
  • Query optimization

Design Decisions & Rationale

Why Workflows Over Events?

Decision: Make workflows the primary abstraction, not events.

Rationale:

  • Workflows represent business processes (how developers think)
  • Events are implementation details of workflows
  • Clearer intent: "This command participates in an invitation workflow"
  • Solves correlation problem elegantly (workflow ID = correlation ID)

Why Support Both Ephemeral & Persistent?

Decision: Support both message queue (ephemeral) and event sourcing (persistent) patterns.

Rationale:

  • Different use cases have different needs
  • Ephemeral: Simple notifications, no need for history
  • Persistent: Audit logs, analytics, replay capability
  • Developer chooses based on requirements
  • Same API for both (progressive complexity)

Why Exactly-Once Opt-In?

Decision: Make exactly-once delivery optional, default to at-least-once.

Rationale:

  • Exactly-once has performance cost (deduplication, locking)
  • Most scenarios can handle duplicates (idempotent handlers)
  • Developer opts in when critical (financial transactions)
  • Simpler default behavior

Why Cross-Service Opt-In?

Decision: Streams are internal by default, external requires explicit configuration.

Rationale:

  • Security: Don't expose events externally by accident
  • Performance: Internal delivery (gRPC) is faster
  • Simplicity: Most services don't need cross-service events
  • Developer explicitly chooses when needed

Why Schema Evolution?

Decision: Support event versioning from the start.

Rationale:

  • Events are long-lived (years in persistent streams)
  • Schema changes are inevitable
  • Breaking changes hurt (can't deserialize old events)
  • Automatic upcasting prevents data loss
  • Essential for persistent streams with replay

Success Metrics

Phase 1

  • Basic workflow registration works
  • Ephemeral streams work (in-memory)
  • Broadcast and exclusive subscriptions work
  • gRPC streaming works
  • Zero breaking changes to existing features

Phase 2

  • Persistent streams work (PostgreSQL)
  • Event replay works from any position
  • Retention policies enforced
  • Consumers can resume from last offset

Phase 3

  • Exactly-once delivery works (no duplicates)
  • Read receipts work (delivered vs read)
  • Unread timeout handling works

Phase 4

  • Events flow from Service A to Service B via RabbitMQ
  • Zero RabbitMQ code in handlers
  • Automatic topology creation works
  • Connection resilience works

Phase 5

  • Old events automatically upcast to new version
  • New consumers receive latest version
  • Multi-hop upcasting works (V1→V2→V3)

Phase 6

  • Health checks detect lagging consumers
  • Metrics exposed for monitoring
  • Management API works
  • Documentation complete

Risk Mitigation

Risk: Breaking Existing Features

Mitigation:

  • Keep AddCommandWithEvents for backward compatibility
  • Run full test suite after each phase
  • Feature flags for new functionality

Risk: Performance Issues

Mitigation:

  • Start with in-memory (fast)
  • Benchmark at each phase
  • Add performance tests before Phase 6
  • Use profiling tools

Risk: Complexity Overload

Mitigation:

  • Progressive disclosure (simple by default)
  • Each phase is independently useful
  • Clear documentation at each level
  • Sample projects for each complexity level

Risk: Database Schema Changes

Mitigation:

  • Use migrations from Phase 2 onward
  • Backward-compatible schema changes
  • Test migration paths

Risk: External Dependencies (RabbitMQ, etc.)

Mitigation:

  • Make external delivery optional
  • Provide in-memory fallback
  • Docker Compose for development
  • Clear setup documentation

Development Guidelines

Coding Standards

  • Use C# 14 features (field keyword, extension members)
  • Follow existing patterns in codebase
  • XML documentation on public APIs
  • Async/await throughout
  • CancellationToken support on all async methods

Testing Strategy

  • Unit tests for core logic
  • Integration tests for storage implementations
  • End-to-end tests for full scenarios
  • Performance benchmarks for critical paths

Documentation Requirements

  • XML doc comments on all public APIs
  • README updates for each phase
  • Sample code for new features
  • Architecture diagrams

Code Review Checklist

  • Follows existing code style
  • Has XML documentation
  • Has unit tests
  • No breaking changes (or documented)
  • Performance acceptable
  • Error handling complete

Timeline Summary

Phase Duration Key Deliverable
Phase 1 2 weeks Basic workflows + ephemeral streaming
Phase 2 2 weeks Persistent streams + replay
Phase 3 1 week Exactly-once + read receipts
Phase 4 2 weeks RabbitMQ cross-service
Phase 5 2 weeks Schema evolution
Phase 6 1+ week Management & monitoring
Total 10+ weeks Production-ready event streaming platform

Next Steps

  1. Review this plan - Validate approach and priorities
  2. Create feature branch - feature/event-streaming
  3. Start Phase 1.1 - Workflow abstraction
  4. Iterate rapidly - Small commits, frequent builds
  5. Update this document - Check off tasks as completed

Notes & Questions

  • Decision: PostgreSQL or pluggable storage from Phase 2?
  • Decision: gRPC-only or add SignalR for browser support?
  • Decision: Create separate NuGet packages per phase or monolithic?
  • Question: Should we support Kafka in Phase 4 or separate phase?
  • Question: Do we need distributed tracing (OpenTelemetry) integration?

Last Updated: 2025-12-09 Status: Planning Phase - Not Started Owner: Mathias Beaulieu-Duncan