## 🆕 Production Enhancements Added ### Rate Limiting - **Limit**: 100 requests per minute per client - **Strategy**: Fixed window rate limiter - **Queue**: Up to 10 requests queued - **Response**: HTTP 429 with retry-after information ### Prometheus Metrics - **Endpoint**: http://localhost:6001/metrics - **Metrics Collected**: - HTTP request duration and count - HTTP client request duration - Custom application metrics - **Format**: Prometheus scraping format - **Integration**: Works with Grafana, Prometheus, or any monitoring tool ### How to Monitor **Option 1: Prometheus + Grafana** ```yaml # Add to docker-compose.yml prometheus: image: prom/prometheus ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml command: - '--config.file=/etc/prometheus/prometheus.yml' grafana: image: grafana/grafana ports: - "3001:3000" ``` **Option 2: Direct Scraping** ```bash # View raw metrics curl http://localhost:6001/metrics # Example metrics you'll see: # http_server_request_duration_seconds_bucket # http_server_request_duration_seconds_count # http_client_request_duration_seconds_bucket ``` ### Rate Limiting Examples ```bash # Test rate limiting for i in {1..105}; do curl -X POST http://localhost:6001/api/command/executeAgent \ -H "Content-Type: application/json" \ -d '{"prompt":"test"}' & done # After 100 requests, you'll see: # { # "error": "Too many requests. Please try again later.", # "retryAfter": 60 # } ``` ### Monitoring Dashboard Metrics **Key Metrics to Watch:** - `http_server_request_duration_seconds` - API latency - `http_client_request_duration_seconds` - Ollama LLM latency - Request rate and error rate - Active connections - Rate limit rejections