dotnet-cqrs/docs/observability/README.md

3.7 KiB

Observability

Comprehensive monitoring, metrics, logging, and management for production deployments.

Overview

Svrnty.CQRS provides production-ready observability features for monitoring health, collecting metrics, structured logging, and operational management.

Key Features:

  • Health Checks - Monitor stream and consumer health
  • Metrics - OpenTelemetry-compatible telemetry
  • Structured Logging - High-performance logging with correlation
  • Management API - REST endpoints for operations

Quick Start

using Svrnty.CQRS.Events;
using Svrnty.CQRS.Events.Logging;

var builder = WebApplication.CreateBuilder(args);

// Health checks
builder.Services.AddStreamHealthChecks(options =>
{
    options.DegradedConsumerLagThreshold = 1000;
    options.UnhealthyConsumerLagThreshold = 10000;
});

// Metrics
builder.Services.AddEventStreamMetrics();
builder.Services.AddOpenTelemetry()
    .WithMetrics(metrics => metrics
        .AddMeter("Svrnty.CQRS.Events")
        .AddPrometheusExporter());

// Logging (already configured via appsettings.json)

var app = builder.Build();

// Management API
app.MapEventStreamManagementApi();

// Health checks
app.MapHealthChecks("/health");

// Prometheus metrics
app.MapPrometheusScrapingEndpoint("/metrics");

app.Run();

Features

Health Checks

Monitor stream and consumer health:

  • Stream Health - Detect unhealthy streams
  • Consumer Health - Detect lag and stalled consumers
  • ASP.NET Core Integration - Built-in health check support

Metrics

OpenTelemetry-compatible metrics:

  • Event Counters - Published, consumed, errors
  • Processing Metrics - Latency, throughput
  • Consumer Metrics - Lag, active consumers
  • Prometheus Integration - Export to Prometheus/Grafana

Logging

Structured logging with correlation:

  • Correlation IDs - Distributed tracing
  • Event IDs - Categorized log events
  • High Performance - LoggerMessage source generators
  • Serilog Integration - Structured logging support

Management API

REST endpoints for operations:

  • Stream Operations - List, query streams
  • Subscription Operations - Query subscriptions
  • Consumer Operations - Monitor consumers
  • Offset Management - Reset consumer positions

Monitoring Dashboard

Grafana Dashboard

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-dashboards
data:
  svrnty-cqrs.json: |
    {
      "panels": [
        {
          "title": "Events Per Second",
          "targets": [
            {
              "expr": "rate(svrnty_cqrs_events_published[1m])"
            }
          ]
        },
        {
          "title": "Consumer Lag",
          "targets": [
            {
              "expr": "svrnty_cqrs_events_consumer_lag"
            }
          ]
        },
        {
          "title": "Processing Latency (P95)",
          "targets": [
            {
              "expr": "histogram_quantile(0.95, svrnty_cqrs_events_processing_latency_bucket)"
            }
          ]
        }
      ]
    }    

Production Checklist

DO

  • Configure health checks
  • Export metrics to monitoring system
  • Set up structured logging
  • Monitor consumer lag
  • Set up alerts for unhealthy conditions
  • Use correlation IDs
  • Track error rates
  • Monitor processing latency

DON'T

  • Don't deploy without health checks
  • Don't ignore consumer lag warnings
  • Don't skip structured logging
  • Don't forget to export metrics
  • Don't ignore stale consumer alerts

See Also