158 lines
3.7 KiB
Markdown
158 lines
3.7 KiB
Markdown
# Observability
|
|
|
|
Comprehensive monitoring, metrics, logging, and management for production deployments.
|
|
|
|
## Overview
|
|
|
|
Svrnty.CQRS provides production-ready observability features for monitoring health, collecting metrics, structured logging, and operational management.
|
|
|
|
**Key Features:**
|
|
|
|
- ✅ **Health Checks** - Monitor stream and consumer health
|
|
- ✅ **Metrics** - OpenTelemetry-compatible telemetry
|
|
- ✅ **Structured Logging** - High-performance logging with correlation
|
|
- ✅ **Management API** - REST endpoints for operations
|
|
|
|
## Quick Start
|
|
|
|
```csharp
|
|
using Svrnty.CQRS.Events;
|
|
using Svrnty.CQRS.Events.Logging;
|
|
|
|
var builder = WebApplication.CreateBuilder(args);
|
|
|
|
// Health checks
|
|
builder.Services.AddStreamHealthChecks(options =>
|
|
{
|
|
options.DegradedConsumerLagThreshold = 1000;
|
|
options.UnhealthyConsumerLagThreshold = 10000;
|
|
});
|
|
|
|
// Metrics
|
|
builder.Services.AddEventStreamMetrics();
|
|
builder.Services.AddOpenTelemetry()
|
|
.WithMetrics(metrics => metrics
|
|
.AddMeter("Svrnty.CQRS.Events")
|
|
.AddPrometheusExporter());
|
|
|
|
// Logging (already configured via appsettings.json)
|
|
|
|
var app = builder.Build();
|
|
|
|
// Management API
|
|
app.MapEventStreamManagementApi();
|
|
|
|
// Health checks
|
|
app.MapHealthChecks("/health");
|
|
|
|
// Prometheus metrics
|
|
app.MapPrometheusScrapingEndpoint("/metrics");
|
|
|
|
app.Run();
|
|
```
|
|
|
|
## Features
|
|
|
|
### [Health Checks](health-checks/)
|
|
|
|
Monitor stream and consumer health:
|
|
|
|
- **Stream Health** - Detect unhealthy streams
|
|
- **Consumer Health** - Detect lag and stalled consumers
|
|
- **ASP.NET Core Integration** - Built-in health check support
|
|
|
|
### [Metrics](metrics/)
|
|
|
|
OpenTelemetry-compatible metrics:
|
|
|
|
- **Event Counters** - Published, consumed, errors
|
|
- **Processing Metrics** - Latency, throughput
|
|
- **Consumer Metrics** - Lag, active consumers
|
|
- **Prometheus Integration** - Export to Prometheus/Grafana
|
|
|
|
### [Logging](logging/)
|
|
|
|
Structured logging with correlation:
|
|
|
|
- **Correlation IDs** - Distributed tracing
|
|
- **Event IDs** - Categorized log events
|
|
- **High Performance** - LoggerMessage source generators
|
|
- **Serilog Integration** - Structured logging support
|
|
|
|
### [Management API](management-api/)
|
|
|
|
REST endpoints for operations:
|
|
|
|
- **Stream Operations** - List, query streams
|
|
- **Subscription Operations** - Query subscriptions
|
|
- **Consumer Operations** - Monitor consumers
|
|
- **Offset Management** - Reset consumer positions
|
|
|
|
## Monitoring Dashboard
|
|
|
|
### Grafana Dashboard
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: grafana-dashboards
|
|
data:
|
|
svrnty-cqrs.json: |
|
|
{
|
|
"panels": [
|
|
{
|
|
"title": "Events Per Second",
|
|
"targets": [
|
|
{
|
|
"expr": "rate(svrnty_cqrs_events_published[1m])"
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"title": "Consumer Lag",
|
|
"targets": [
|
|
{
|
|
"expr": "svrnty_cqrs_events_consumer_lag"
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"title": "Processing Latency (P95)",
|
|
"targets": [
|
|
{
|
|
"expr": "histogram_quantile(0.95, svrnty_cqrs_events_processing_latency_bucket)"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Production Checklist
|
|
|
|
### ✅ DO
|
|
|
|
- Configure health checks
|
|
- Export metrics to monitoring system
|
|
- Set up structured logging
|
|
- Monitor consumer lag
|
|
- Set up alerts for unhealthy conditions
|
|
- Use correlation IDs
|
|
- Track error rates
|
|
- Monitor processing latency
|
|
|
|
### ❌ DON'T
|
|
|
|
- Don't deploy without health checks
|
|
- Don't ignore consumer lag warnings
|
|
- Don't skip structured logging
|
|
- Don't forget to export metrics
|
|
- Don't ignore stale consumer alerts
|
|
|
|
## See Also
|
|
|
|
- [Event Streaming Overview](../event-streaming/README.md)
|
|
- [Best Practices](../best-practices/README.md)
|
|
- [Troubleshooting](../troubleshooting/README.md)
|