dotnet-cqrs/docs/observability/README.md

158 lines
3.7 KiB
Markdown

# Observability
Comprehensive monitoring, metrics, logging, and management for production deployments.
## Overview
Svrnty.CQRS provides production-ready observability features for monitoring health, collecting metrics, structured logging, and operational management.
**Key Features:**
-**Health Checks** - Monitor stream and consumer health
-**Metrics** - OpenTelemetry-compatible telemetry
-**Structured Logging** - High-performance logging with correlation
-**Management API** - REST endpoints for operations
## Quick Start
```csharp
using Svrnty.CQRS.Events;
using Svrnty.CQRS.Events.Logging;
var builder = WebApplication.CreateBuilder(args);
// Health checks
builder.Services.AddStreamHealthChecks(options =>
{
options.DegradedConsumerLagThreshold = 1000;
options.UnhealthyConsumerLagThreshold = 10000;
});
// Metrics
builder.Services.AddEventStreamMetrics();
builder.Services.AddOpenTelemetry()
.WithMetrics(metrics => metrics
.AddMeter("Svrnty.CQRS.Events")
.AddPrometheusExporter());
// Logging (already configured via appsettings.json)
var app = builder.Build();
// Management API
app.MapEventStreamManagementApi();
// Health checks
app.MapHealthChecks("/health");
// Prometheus metrics
app.MapPrometheusScrapingEndpoint("/metrics");
app.Run();
```
## Features
### [Health Checks](health-checks/)
Monitor stream and consumer health:
- **Stream Health** - Detect unhealthy streams
- **Consumer Health** - Detect lag and stalled consumers
- **ASP.NET Core Integration** - Built-in health check support
### [Metrics](metrics/)
OpenTelemetry-compatible metrics:
- **Event Counters** - Published, consumed, errors
- **Processing Metrics** - Latency, throughput
- **Consumer Metrics** - Lag, active consumers
- **Prometheus Integration** - Export to Prometheus/Grafana
### [Logging](logging/)
Structured logging with correlation:
- **Correlation IDs** - Distributed tracing
- **Event IDs** - Categorized log events
- **High Performance** - LoggerMessage source generators
- **Serilog Integration** - Structured logging support
### [Management API](management-api/)
REST endpoints for operations:
- **Stream Operations** - List, query streams
- **Subscription Operations** - Query subscriptions
- **Consumer Operations** - Monitor consumers
- **Offset Management** - Reset consumer positions
## Monitoring Dashboard
### Grafana Dashboard
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards
data:
svrnty-cqrs.json: |
{
"panels": [
{
"title": "Events Per Second",
"targets": [
{
"expr": "rate(svrnty_cqrs_events_published[1m])"
}
]
},
{
"title": "Consumer Lag",
"targets": [
{
"expr": "svrnty_cqrs_events_consumer_lag"
}
]
},
{
"title": "Processing Latency (P95)",
"targets": [
{
"expr": "histogram_quantile(0.95, svrnty_cqrs_events_processing_latency_bucket)"
}
]
}
]
}
```
## Production Checklist
### ✅ DO
- Configure health checks
- Export metrics to monitoring system
- Set up structured logging
- Monitor consumer lag
- Set up alerts for unhealthy conditions
- Use correlation IDs
- Track error rates
- Monitor processing latency
### ❌ DON'T
- Don't deploy without health checks
- Don't ignore consumer lag warnings
- Don't skip structured logging
- Don't forget to export metrics
- Don't ignore stale consumer alerts
## See Also
- [Event Streaming Overview](../event-streaming/README.md)
- [Best Practices](../best-practices/README.md)
- [Troubleshooting](../troubleshooting/README.md)