Steev_code/TESTING_GUIDE.md

# Production Stack Testing Guide

This guide provides instructions for testing your AI Agent production stack after resolving the Docker build issues.

## Current Status

**Build Status:** ❌ Failed at ~95%
**Issue:** gRPC source generator task (`WriteProtoFileTask`) not found in .NET 10 preview SDK
**Location:** `Svrnty.CQRS.Grpc.Generators`

## Build Issues to Resolve

### Issue 1: gRPC Generator Compatibility
```
error MSB4036: The "WriteProtoFileTask" task was not found
```

**Possible Solutions:**
1. **Skip gRPC for Docker build:** Temporarily remove gRPC dependency from `Svrnty.Sample/Svrnty.Sample.csproj`
2. **Use different .NET SDK:** Try .NET 9 or stable .NET 8 instead of .NET 10 preview
3. **Fix the gRPC generator:** Update `Svrnty.CQRS.Grpc.Generators` to work with .NET 10 preview SDK

### Quick Fix: Disable gRPC for Testing

Edit `Svrnty.Sample/Svrnty.Sample.csproj` and comment out:
```xml
<!-- Temporarily disabled for Docker build -->
<!-- <ProjectReference Include="..\Svrnty.CQRS.Grpc\Svrnty.CQRS.Grpc.csproj" /> -->
```

Then rebuild:
```bash
docker compose up -d --build
```

## Once Build Succeeds

### Step 1: Start the Stack
```bash
# From project root
docker compose up -d

# Wait for services to start (2-3 minutes)
docker compose ps
```

### Step 2: Verify Services
```bash
# Check all services are running
docker compose ps

# Should show:
# api       Up      0.0.0.0:6000-6001->6000-6001/tcp
# postgres  Up      5432/tcp
# ollama    Up      11434/tcp
# langfuse  Up      3000/tcp
```

### Step 3: Pull Ollama Model (One-time)
```bash
docker exec ollama ollama pull qwen2.5-coder:7b
# This downloads ~6.7GB, takes 5-10 minutes
```

### Step 4: Configure Langfuse (One-time)
1. Open http://localhost:3000
2. Create account (first-time setup)
3. Create a project (e.g., "AI Agent")
4. Go to Settings → API Keys
5. Copy the Public and Secret keys
6. Update `.env`:
   ```bash
   LANGFUSE_PUBLIC_KEY=pk-lf-...
   LANGFUSE_SECRET_KEY=sk-lf-...
   ```
7. Restart API to enable tracing:
   ```bash
   docker compose restart api
   ```

### Step 5: Run Comprehensive Tests
```bash
# Execute the full test suite
./test-production-stack.sh
```

## Test Suite Overview

The `test-production-stack.sh` script runs **7 comprehensive test phases**:

### Phase 1: Functional Testing (15 min)
- ✓ Health endpoint checks (API, Langfuse, Ollama, PostgreSQL)
- ✓ Agent math operations (simple and complex)
- ✓ Database queries (revenue, customers)
- ✓ Multi-turn conversations

**Tests:** 9 tests
**What it validates:** Core agent functionality and service connectivity

### Phase 2: Rate Limiting (5 min)
- ✓ Rate limit enforcement (100 req/min)
- ✓ HTTP 429 responses when exceeded
- ✓ Rate limit headers present
- ✓ Queue behavior (10 req queue depth)

**Tests:** 2 tests
**What it validates:** API protection and rate limiter configuration

### Phase 3: Observability (10 min)
- ✓ Langfuse trace generation
- ✓ Prometheus metrics collection
- ✓ HTTP request/response metrics
- ✓ Function call tracking
- ✓ Request counting accuracy

**Tests:** 4 tests
**What it validates:** Monitoring and debugging capabilities

### Phase 4: Load Testing (5 min)
- ✓ Concurrent request handling (20 parallel requests)
- ✓ Sustained load (30 seconds, 2 req/sec)
- ✓ Performance under stress
- ✓ Response time consistency

**Tests:** 2 tests
**What it validates:** Production-level performance and scalability

### Phase 5: Database Persistence (5 min)
- ✓ Conversation storage in PostgreSQL
- ✓ Conversation ID generation
- ✓ Seed data integrity (revenue, customers)
- ✓ Database query accuracy

**Tests:** 4 tests
**What it validates:** Data persistence and reliability

### Phase 6: Error Handling & Recovery (10 min)
- ✓ Invalid request handling (400/422 responses)
- ✓ Service restart recovery
- ✓ Graceful error messages
- ✓ Database connection resilience

**Tests:** 2 tests
**What it validates:** Production readiness and fault tolerance

### Total: ~50 minutes, 23+ tests

## Manual Testing Examples

### Test 1: Simple Math
```bash
curl -X POST http://localhost:6001/api/command/executeAgent \
  -H "Content-Type: application/json" \
  -d '{"prompt":"What is 5 + 3?"}'
```

**Expected Response:**
```json
{
  "conversationId": "uuid-here",
  "success": true,
  "response": "The result of 5 + 3 is 8."
}
```

### Test 2: Database Query
```bash
curl -X POST http://localhost:6001/api/command/executeAgent \
  -H "Content-Type: application/json" \
  -d '{"prompt":"What was our revenue in January 2025?"}'
```

**Expected Response:**
```json
{
  "conversationId": "uuid-here",
  "success": true,
  "response": "The revenue for January 2025 was $245,000."
}
```

### Test 3: Rate Limiting
```bash
# Send 110 requests quickly
for i in {1..110}; do
  curl -X POST http://localhost:6001/api/command/executeAgent \
    -H "Content-Type: application/json" \
    -d '{"prompt":"test"}' &
done
wait

# First 100 succeed, next 10 queue, remaining get HTTP 429
```

### Test 4: Check Metrics
```bash
curl http://localhost:6001/metrics | grep http_server_request_duration
```

**Expected Output:**
```
http_server_request_duration_seconds_count{...} 150
http_server_request_duration_seconds_sum{...} 45.2
```

### Test 5: View Traces in Langfuse
1. Open http://localhost:3000/traces
2. Click on a trace to see:
   - Agent execution span (root)
   - Tool registration span
   - LLM completion spans
   - Function call spans (Add, DatabaseQuery, etc.)
   - Timing breakdown

## Test Results Interpretation

### Success Criteria
- **>90% pass rate:** Production ready
- **80-90% pass rate:** Minor issues to address
- **<80% pass rate:** Significant issues, not production ready

### Common Test Failures

#### Failure: "Agent returned error or timeout"
**Cause:** Ollama model not pulled or API not responding
**Fix:**
```bash
docker exec ollama ollama pull qwen2.5-coder:7b
docker compose restart api
```

#### Failure: "Service not running"
**Cause:** Docker container failed to start
**Fix:**
```bash
docker compose logs [service-name]
docker compose up -d [service-name]
```

#### Failure: "No rate limit headers found"
**Cause:** Rate limiter not configured
**Fix:** Check `Program.cs:Svrnty.Sample/Program.cs:92-96` for rate limiter setup

#### Failure: "Traces not visible in Langfuse"
**Cause:** Langfuse keys not configured in `.env`
**Fix:** Follow Step 4 above to configure API keys

## Accessing Logs

### API Logs
```bash
docker compose logs -f api
```

### All Services
```bash
docker compose logs -f
```

### Filter for Errors
```bash
docker compose logs | grep -i error
```

## Stopping the Stack

```bash
# Stop all services
docker compose down

# Stop and remove volumes (clean slate)
docker compose down -v
```

## Troubleshooting

### Issue: Ollama Out of Memory
**Symptoms:** Agent responses timeout or return errors
**Solution:**
```bash
# Increase Docker memory limit to 8GB+
# Docker Desktop → Settings → Resources → Memory
docker compose restart ollama
```

### Issue: PostgreSQL Connection Failed
**Symptoms:** Database queries fail
**Solution:**
```bash
docker compose logs postgres
# Check for port conflicts or permission issues
docker compose down -v
docker compose up -d
```

### Issue: Langfuse Not Showing Traces
**Symptoms:** Metrics work but no traces in UI
**Solution:**
1. Verify keys in `.env` match Langfuse UI
2. Check API logs for OTLP export errors:
   ```bash
   docker compose logs api | grep -i "otlp\|langfuse"
   ```
3. Restart API after updating keys:
   ```bash
   docker compose restart api
   ```

### Issue: Port Already in Use
**Symptoms:** `docker compose up` fails with "port already allocated"
**Solution:**
```bash
# Find what's using the port
lsof -i :6001   # API HTTP
lsof -i :6000   # API gRPC
lsof -i :5432   # PostgreSQL
lsof -i :3000   # Langfuse

# Kill the process or change ports in docker-compose.yml
```

## Performance Expectations

### Response Times
- **Simple Math:** 1-2 seconds
- **Database Query:** 2-3 seconds
- **Complex Multi-step:** 3-5 seconds

### Throughput
- **Rate Limit:** 100 requests/minute
- **Queue Depth:** 10 requests
- **Concurrent Connections:** 20+ supported

### Resource Usage
- **Memory:** ~4GB total (Ollama ~3GB, others ~1GB)
- **CPU:** Variable based on query complexity
- **Disk:** ~10GB (Ollama model + Docker images)

## Production Deployment Checklist

Before deploying to production:

- [ ] All tests passing (>90% success rate)
- [ ] Langfuse API keys configured
- [ ] PostgreSQL credentials rotated
- [ ] Rate limits tuned for expected traffic
- [ ] Health checks validated
- [ ] Metrics dashboards created
- [ ] Alert rules configured
- [ ] Backup strategy implemented
- [ ] Secrets in environment variables (not code)
- [ ] Network policies configured
- [ ] TLS certificates installed (for HTTPS)
- [ ] Load balancer configured (if multi-instance)

## Next Steps After Testing

1. **Review test results:** Identify any failures and fix root causes
2. **Tune rate limits:** Adjust based on expected production traffic
3. **Create dashboards:** Build Grafana dashboards from Prometheus metrics
4. **Set up alerts:** Configure alerting for:
   - API health check failures
   - High error rates (>5%)
   - High latency (P95 >5s)
   - Database connection failures
5. **Optimize Ollama:** Fine-tune model parameters for your use case
6. **Scale testing:** Test with higher concurrency (50-100 parallel)
7. **Security audit:** Review authentication, authorization, input validation

## Support Resources

- **Project README:** [README.md](./README.md)
- **Deployment Guide:** [DEPLOYMENT_README.md](./DEPLOYMENT_README.md)
- **Docker Compose:** [docker-compose.yml](./docker-compose.yml)
- **Test Script:** [test-production-stack.sh](./test-production-stack.sh)

## Getting Help

If tests fail or you encounter issues:
1. Check logs: `docker compose logs -f`
2. Review this guide's troubleshooting section
3. Verify all prerequisites are met
4. Check for port conflicts or resource constraints

---

**Test Script Version:** 1.0
**Last Updated:** 2025-11-08
**Estimated Total Test Time:** ~50 minutes