# Production Stack Testing Guide This guide provides instructions for testing your AI Agent production stack after resolving the Docker build issues. ## Current Status **Build Status:** ❌ Failed at ~95% **Issue:** gRPC source generator task (`WriteProtoFileTask`) not found in .NET 10 preview SDK **Location:** `Svrnty.CQRS.Grpc.Generators` ## Build Issues to Resolve ### Issue 1: gRPC Generator Compatibility ``` error MSB4036: The "WriteProtoFileTask" task was not found ``` **Possible Solutions:** 1. **Skip gRPC for Docker build:** Temporarily remove gRPC dependency from `Svrnty.Sample/Svrnty.Sample.csproj` 2. **Use different .NET SDK:** Try .NET 9 or stable .NET 8 instead of .NET 10 preview 3. **Fix the gRPC generator:** Update `Svrnty.CQRS.Grpc.Generators` to work with .NET 10 preview SDK ### Quick Fix: Disable gRPC for Testing Edit `Svrnty.Sample/Svrnty.Sample.csproj` and comment out: ```xml ``` Then rebuild: ```bash docker compose up -d --build ``` ## Once Build Succeeds ### Step 1: Start the Stack ```bash # From project root docker compose up -d # Wait for services to start (2-3 minutes) docker compose ps ``` ### Step 2: Verify Services ```bash # Check all services are running docker compose ps # Should show: # api Up 0.0.0.0:6000-6001->6000-6001/tcp # postgres Up 5432/tcp # ollama Up 11434/tcp # langfuse Up 3000/tcp ``` ### Step 3: Pull Ollama Model (One-time) ```bash docker exec ollama ollama pull qwen2.5-coder:7b # This downloads ~6.7GB, takes 5-10 minutes ``` ### Step 4: Configure Langfuse (One-time) 1. Open http://localhost:3000 2. Create account (first-time setup) 3. Create a project (e.g., "AI Agent") 4. Go to Settings → API Keys 5. Copy the Public and Secret keys 6. Update `.env`: ```bash LANGFUSE_PUBLIC_KEY=pk-lf-... LANGFUSE_SECRET_KEY=sk-lf-... ``` 7. Restart API to enable tracing: ```bash docker compose restart api ``` ### Step 5: Run Comprehensive Tests ```bash # Execute the full test suite ./test-production-stack.sh ``` ## Test Suite Overview The `test-production-stack.sh` script runs **7 comprehensive test phases**: ### Phase 1: Functional Testing (15 min) - ✓ Health endpoint checks (API, Langfuse, Ollama, PostgreSQL) - ✓ Agent math operations (simple and complex) - ✓ Database queries (revenue, customers) - ✓ Multi-turn conversations **Tests:** 9 tests **What it validates:** Core agent functionality and service connectivity ### Phase 2: Rate Limiting (5 min) - ✓ Rate limit enforcement (100 req/min) - ✓ HTTP 429 responses when exceeded - ✓ Rate limit headers present - ✓ Queue behavior (10 req queue depth) **Tests:** 2 tests **What it validates:** API protection and rate limiter configuration ### Phase 3: Observability (10 min) - ✓ Langfuse trace generation - ✓ Prometheus metrics collection - ✓ HTTP request/response metrics - ✓ Function call tracking - ✓ Request counting accuracy **Tests:** 4 tests **What it validates:** Monitoring and debugging capabilities ### Phase 4: Load Testing (5 min) - ✓ Concurrent request handling (20 parallel requests) - ✓ Sustained load (30 seconds, 2 req/sec) - ✓ Performance under stress - ✓ Response time consistency **Tests:** 2 tests **What it validates:** Production-level performance and scalability ### Phase 5: Database Persistence (5 min) - ✓ Conversation storage in PostgreSQL - ✓ Conversation ID generation - ✓ Seed data integrity (revenue, customers) - ✓ Database query accuracy **Tests:** 4 tests **What it validates:** Data persistence and reliability ### Phase 6: Error Handling & Recovery (10 min) - ✓ Invalid request handling (400/422 responses) - ✓ Service restart recovery - ✓ Graceful error messages - ✓ Database connection resilience **Tests:** 2 tests **What it validates:** Production readiness and fault tolerance ### Total: ~50 minutes, 23+ tests ## Manual Testing Examples ### Test 1: Simple Math ```bash curl -X POST http://localhost:6001/api/command/executeAgent \ -H "Content-Type: application/json" \ -d '{"prompt":"What is 5 + 3?"}' ``` **Expected Response:** ```json { "conversationId": "uuid-here", "success": true, "response": "The result of 5 + 3 is 8." } ``` ### Test 2: Database Query ```bash curl -X POST http://localhost:6001/api/command/executeAgent \ -H "Content-Type: application/json" \ -d '{"prompt":"What was our revenue in January 2025?"}' ``` **Expected Response:** ```json { "conversationId": "uuid-here", "success": true, "response": "The revenue for January 2025 was $245,000." } ``` ### Test 3: Rate Limiting ```bash # Send 110 requests quickly for i in {1..110}; do curl -X POST http://localhost:6001/api/command/executeAgent \ -H "Content-Type: application/json" \ -d '{"prompt":"test"}' & done wait # First 100 succeed, next 10 queue, remaining get HTTP 429 ``` ### Test 4: Check Metrics ```bash curl http://localhost:6001/metrics | grep http_server_request_duration ``` **Expected Output:** ``` http_server_request_duration_seconds_count{...} 150 http_server_request_duration_seconds_sum{...} 45.2 ``` ### Test 5: View Traces in Langfuse 1. Open http://localhost:3000/traces 2. Click on a trace to see: - Agent execution span (root) - Tool registration span - LLM completion spans - Function call spans (Add, DatabaseQuery, etc.) - Timing breakdown ## Test Results Interpretation ### Success Criteria - **>90% pass rate:** Production ready - **80-90% pass rate:** Minor issues to address - **<80% pass rate:** Significant issues, not production ready ### Common Test Failures #### Failure: "Agent returned error or timeout" **Cause:** Ollama model not pulled or API not responding **Fix:** ```bash docker exec ollama ollama pull qwen2.5-coder:7b docker compose restart api ``` #### Failure: "Service not running" **Cause:** Docker container failed to start **Fix:** ```bash docker compose logs [service-name] docker compose up -d [service-name] ``` #### Failure: "No rate limit headers found" **Cause:** Rate limiter not configured **Fix:** Check `Program.cs:Svrnty.Sample/Program.cs:92-96` for rate limiter setup #### Failure: "Traces not visible in Langfuse" **Cause:** Langfuse keys not configured in `.env` **Fix:** Follow Step 4 above to configure API keys ## Accessing Logs ### API Logs ```bash docker compose logs -f api ``` ### All Services ```bash docker compose logs -f ``` ### Filter for Errors ```bash docker compose logs | grep -i error ``` ## Stopping the Stack ```bash # Stop all services docker compose down # Stop and remove volumes (clean slate) docker compose down -v ``` ## Troubleshooting ### Issue: Ollama Out of Memory **Symptoms:** Agent responses timeout or return errors **Solution:** ```bash # Increase Docker memory limit to 8GB+ # Docker Desktop → Settings → Resources → Memory docker compose restart ollama ``` ### Issue: PostgreSQL Connection Failed **Symptoms:** Database queries fail **Solution:** ```bash docker compose logs postgres # Check for port conflicts or permission issues docker compose down -v docker compose up -d ``` ### Issue: Langfuse Not Showing Traces **Symptoms:** Metrics work but no traces in UI **Solution:** 1. Verify keys in `.env` match Langfuse UI 2. Check API logs for OTLP export errors: ```bash docker compose logs api | grep -i "otlp\|langfuse" ``` 3. Restart API after updating keys: ```bash docker compose restart api ``` ### Issue: Port Already in Use **Symptoms:** `docker compose up` fails with "port already allocated" **Solution:** ```bash # Find what's using the port lsof -i :6001 # API HTTP lsof -i :6000 # API gRPC lsof -i :5432 # PostgreSQL lsof -i :3000 # Langfuse # Kill the process or change ports in docker-compose.yml ``` ## Performance Expectations ### Response Times - **Simple Math:** 1-2 seconds - **Database Query:** 2-3 seconds - **Complex Multi-step:** 3-5 seconds ### Throughput - **Rate Limit:** 100 requests/minute - **Queue Depth:** 10 requests - **Concurrent Connections:** 20+ supported ### Resource Usage - **Memory:** ~4GB total (Ollama ~3GB, others ~1GB) - **CPU:** Variable based on query complexity - **Disk:** ~10GB (Ollama model + Docker images) ## Production Deployment Checklist Before deploying to production: - [ ] All tests passing (>90% success rate) - [ ] Langfuse API keys configured - [ ] PostgreSQL credentials rotated - [ ] Rate limits tuned for expected traffic - [ ] Health checks validated - [ ] Metrics dashboards created - [ ] Alert rules configured - [ ] Backup strategy implemented - [ ] Secrets in environment variables (not code) - [ ] Network policies configured - [ ] TLS certificates installed (for HTTPS) - [ ] Load balancer configured (if multi-instance) ## Next Steps After Testing 1. **Review test results:** Identify any failures and fix root causes 2. **Tune rate limits:** Adjust based on expected production traffic 3. **Create dashboards:** Build Grafana dashboards from Prometheus metrics 4. **Set up alerts:** Configure alerting for: - API health check failures - High error rates (>5%) - High latency (P95 >5s) - Database connection failures 5. **Optimize Ollama:** Fine-tune model parameters for your use case 6. **Scale testing:** Test with higher concurrency (50-100 parallel) 7. **Security audit:** Review authentication, authorization, input validation ## Support Resources - **Project README:** [README.md](./README.md) - **Deployment Guide:** [DEPLOYMENT_README.md](./DEPLOYMENT_README.md) - **Docker Compose:** [docker-compose.yml](./docker-compose.yml) - **Test Script:** [test-production-stack.sh](./test-production-stack.sh) ## Getting Help If tests fail or you encounter issues: 1. Check logs: `docker compose logs -f` 2. Review this guide's troubleshooting section 3. Verify all prerequisites are met 4. Check for port conflicts or resource constraints --- **Test Script Version:** 1.0 **Last Updated:** 2025-11-08 **Estimated Total Test Time:** ~50 minutes