svrnty-mcp-gateway/docs/module-design.md
Svrnty a4a1dd2e38 docs: comprehensive AI coding assistant research and MCP-first implementation plan
Research conducted on modern AI coding assistants (Cursor, GitHub Copilot, Cline,
Aider, Windsurf, Replit Agent) to understand architecture patterns, context management,
code editing workflows, and tool use protocols.

Key Decision: Pivoted from building full CLI (40-50h) to validation-driven MCP-first
approach (10-15h). Build 5 core CODEX MCP tools that work with ANY coding assistant,
validate adoption over 2-4 weeks, then decide on full CLI if demand proven.

Files:
- research/ai-systems/modern-coding-assistants-architecture.md (comprehensive research)
- research/ai-systems/codex-coding-assistant-implementation-plan.md (original CLI plan, preserved)
- research/ai-systems/codex-mcp-tools-implementation-plan.md (approved MCP-first plan)
- ideas/registry.json (updated with approved MCP tools proposal)

Architech Validation: APPROVED with pivot to MCP-first approach
Human Decision: Approved (pragmatic validation-driven development)

Next: Begin Phase 1 implementation (10-15 hours, 5 core MCP tools)

🤖 Generated with CODEX Research System

Co-Authored-By: The Archivist <archivist@codex.svrnty.io>
Co-Authored-By: The Architech <architech@codex.svrnty.io>
Co-Authored-By: Mathias Beaulieu-Duncan <mat@svrnty.io>
2025-10-22 21:00:34 -04:00

534 lines
13 KiB
Markdown

# OpenHarbor.MCP.Gateway - Module Design
**Document Type:** Architecture Design Document
**Status:** Planned
**Version:** 1.0.0
**Last Updated:** 2025-10-19
---
## Overview
OpenHarbor.MCP.Gateway is a .NET 8 library that provides proxy and routing infrastructure for MCP traffic, enabling centralized management, authentication, monitoring, and load balancing between MCP clients and servers. This document defines the architecture, components, and design decisions.
### Purpose
- **What**: Gateway/proxy library for routing MCP traffic between clients and servers
- **Why**: Enable centralized management, security, and monitoring of MCP infrastructure
- **How**: Clean Architecture with routing strategies, health monitoring, and transport abstraction
---
## Architecture
### Clean Architecture Layers
```
┌─────────────────────────────────────────────────┐
│ OpenHarbor.MCP.Gateway.Cli (Executable) │
│ ┌───────────────────────────────────────────┐ │
│ │ OpenHarbor.MCP.Gateway.AspNetCore (HTTP)│ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ OpenHarbor.MCP.Gateway.Infrastructure│ │ │
│ │ │ ┌───────────────────────────────┐ │ │ │
│ │ │ │ OpenHarbor.MCP.Gateway.Core │ │ │ │
│ │ │ │ - IGatewayRouter │ │ │ │
│ │ │ │ - IRoutingStrategy │ │ │ │
│ │ │ │ - IAuthProvider │ │ │ │
│ │ │ │ - ICircuitBreaker │ │ │ │
│ │ │ │ - Models (no dependencies) │ │ │ │
│ │ │ └───────────────────────────────┘ │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
```
### Layer Responsibilities
| Layer | Purpose | Dependencies |
|-------|---------|----|
| **Core** | Abstractions and models | None |
| **Infrastructure** | Routing, auth, circuit breakers | Core, System.Text.Json |
| **AspNetCore** | HTTP endpoints and DI | Core, Infrastructure, ASP.NET Core |
| **Cli** | Management CLI | All layers |
---
## Core Components
### IGatewayRouter Interface
Primary interface for gateway routing operations:
```csharp
public interface IGatewayRouter
{
// Request Routing
Task<McpResponse> RouteRequestAsync(
McpRequest request,
RoutingContext context,
CancellationToken ct = default
);
// Server Management
Task<IEnumerable<ServerInfo>> GetRegisteredServersAsync();
Task RegisterServerAsync(ServerConfig config);
Task UnregisterServerAsync(string serverId);
// Health Monitoring
Task<ServerHealthStatus> GetServerHealthAsync(string serverId);
Task<IEnumerable<ServerHealthStatus>> GetAllServerHealthAsync();
}
```
### IRoutingStrategy Interface
Defines server selection logic:
```csharp
public interface IRoutingStrategy
{
string SelectServer(
RoutingContext context,
IEnumerable<ServerInfo> availableServers
);
}
public class RoutingContext
{
public string? ToolName { get; set; }
public string? ClientId { get; set; }
public Dictionary<string, string>? Headers { get; set; }
public Dictionary<string, object>? Metadata { get; set; }
}
```
### IAuthProvider Interface
Authentication and authorization:
```csharp
public interface IAuthProvider
{
Task<AuthResult> AuthenticateAsync(
string? apiKey,
Dictionary<string, string>? headers,
CancellationToken ct = default
);
Task<bool> AuthorizeAsync(
string clientId,
string serverId,
string toolName,
CancellationToken ct = default
);
}
public class AuthResult
{
public bool IsAuthenticated { get; set; }
public string? ClientId { get; set; }
public IEnumerable<string> Roles { get; set; } = [];
public string? ErrorMessage { get; set; }
}
```
### ICircuitBreaker Interface
Prevent cascading failures:
```csharp
public interface ICircuitBreaker
{
bool IsOpen(string serverId);
void RecordSuccess(string serverId);
void RecordFailure(string serverId);
void Reset(string serverId);
}
```
---
## Routing Strategies
### Built-In Strategies
#### Round-Robin Strategy
```csharp
public class RoundRobinStrategy : IRoutingStrategy
{
private int _currentIndex = 0;
public string SelectServer(
RoutingContext context,
IEnumerable<ServerInfo> servers)
{
var healthyServers = servers.Where(s => s.IsHealthy).ToList();
if (healthyServers.Count == 0)
{
throw new NoHealthyServersException();
}
var index = Interlocked.Increment(ref _currentIndex) % healthyServers.Count;
return healthyServers[index].Id;
}
}
```
#### Tool-Based Strategy
```csharp
public class ToolBasedStrategy : IRoutingStrategy
{
private readonly Dictionary<string, string> _toolPrefixMappings;
public string SelectServer(
RoutingContext context,
IEnumerable<ServerInfo> servers)
{
if (context.ToolName == null)
{
throw new InvalidOperationException("ToolName required for tool-based routing");
}
foreach (var (prefix, serverId) in _toolPrefixMappings)
{
if (context.ToolName.StartsWith(prefix))
{
return serverId;
}
}
// Default to first healthy server
return servers.First(s => s.IsHealthy).Id;
}
}
```
#### Client-Based Strategy
```csharp
public class ClientBasedStrategy : IRoutingStrategy
{
private readonly Dictionary<string, string> _clientMappings;
public string SelectServer(
RoutingContext context,
IEnumerable<ServerInfo> servers)
{
if (context.ClientId != null &&
_clientMappings.TryGetValue(context.ClientId, out var serverId))
{
return serverId;
}
// Default routing
return servers.First(s => s.IsHealthy).Id;
}
}
```
---
## Configuration
### Gateway Configuration Model
```csharp
public class GatewayConfig
{
public string Name { get; set; } = "MCP Gateway";
public string Version { get; set; } = "1.0.0";
public string? Description { get; set; }
public string ListenAddress { get; set; } = "http://localhost:8080";
}
public class ServerConfig
{
public string Id { get; set; } = string.Empty;
public string Name { get; set; } = string.Empty;
public TransportConfig Transport { get; set; } = new();
public bool Enabled { get; set; } = true;
public Dictionary<string, string>? Metadata { get; set; }
}
public class TransportConfig
{
public string Type { get; set; } = "Stdio"; // "Stdio" or "Http"
public string? Command { get; set; }
public string[]? Args { get; set; }
public string? BaseUrl { get; set; }
public Dictionary<string, string>? Headers { get; set; }
}
```
### Routing Configuration
```csharp
public class RoutingConfig
{
public string Strategy { get; set; } = "RoundRobin"; // "RoundRobin", "ToolBased", "ClientBased"
public TimeSpan HealthCheckInterval { get; set; } = TimeSpan.FromSeconds(30);
public Dictionary<string, string>? StrategyConfig { get; set; }
}
```
### Security Configuration
```csharp
public class SecurityConfig
{
public bool EnableAuthentication { get; set; } = false;
public string? ApiKeyHeader { get; set; } = "X-MCP-API-Key";
public RateLimitConfig? RateLimit { get; set; }
}
public class RateLimitConfig
{
public int RequestsPerMinute { get; set; } = 100;
public int BurstSize { get; set; } = 20;
}
```
---
## Health Monitoring
### Server Health Check
```csharp
public class ServerHealthStatus
{
public string ServerId { get; set; } = string.Empty;
public string ServerName { get; set; } = string.Empty;
public bool IsHealthy { get; set; }
public DateTime LastCheck { get; set; }
public TimeSpan? ResponseTime { get; set; }
public string? ErrorMessage { get; set; }
}
```
### Health Check Implementation
```csharp
public class McpServerHealthCheck : IHealthCheck
{
private readonly IGatewayRouter _router;
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken ct = default)
{
var statuses = await _router.GetAllServerHealthAsync();
var healthyCount = statuses.Count(s => s.IsHealthy);
var totalCount = statuses.Count();
if (healthyCount == totalCount)
{
return HealthCheckResult.Healthy(
$"All {totalCount} servers healthy");
}
else if (healthyCount > 0)
{
return HealthCheckResult.Degraded(
$"{healthyCount}/{totalCount} servers healthy");
}
else
{
return HealthCheckResult.Unhealthy(
"No healthy servers available");
}
}
}
```
---
## Error Handling
### Exception Hierarchy
```csharp
public class GatewayException : Exception { }
public class NoHealthyServersException : GatewayException { }
public class ServerNotFoundException : GatewayException
{
public string ServerId { get; }
}
public class RoutingException : GatewayException
{
public RoutingContext Context { get; }
}
public class AuthenticationException : GatewayException { }
public class RateLimitExceededException : GatewayException
{
public string ClientId { get; }
public int RequestsPerMinute { get; }
}
```
### Circuit Breaker Implementation
```csharp
public class CircuitBreaker : ICircuitBreaker
{
private readonly ConcurrentDictionary<string, CircuitState> _states = new();
private readonly int _failureThreshold = 5;
private readonly TimeSpan _timeout = TimeSpan.FromSeconds(30);
public bool IsOpen(string serverId)
{
if (!_states.TryGetValue(serverId, out var state))
{
return false;
}
if (state.State == CircuitState.Open &&
DateTime.UtcNow - state.LastFailure > _timeout)
{
// Transition to half-open
state.State = CircuitState.HalfOpen;
}
return state.State == CircuitState.Open;
}
public void RecordSuccess(string serverId)
{
_states.AddOrUpdate(serverId,
_ => new CircuitState { State = CircuitState.Closed },
(_, state) => { state.FailureCount = 0; state.State = CircuitState.Closed; return state; });
}
public void RecordFailure(string serverId)
{
_states.AddOrUpdate(serverId,
_ => new CircuitState { FailureCount = 1, LastFailure = DateTime.UtcNow },
(_, state) =>
{
state.FailureCount++;
state.LastFailure = DateTime.UtcNow;
if (state.FailureCount >= _failureThreshold)
{
state.State = CircuitState.Open;
}
return state;
});
}
public void Reset(string serverId)
{
_states.TryRemove(serverId, out _);
}
}
enum CircuitState
{
Closed,
Open,
HalfOpen
}
```
---
## Testing Strategy
### Unit Tests
- Test Core abstractions with mocks
- Test routing strategies with mock servers
- Test circuit breaker logic
- Test authentication/authorization
### Integration Tests
- Test actual routing to real MCP servers
- Test health checks
- Test error scenarios (server failures, timeouts)
- Test authentication flows
### Test Coverage Goals
- Core: >90%
- Infrastructure: >80%
- AspNetCore: >70%
---
## Performance Considerations
### Connection Pooling
- Maintain persistent connections to backend servers
- Configurable pool size per server
- Idle connection eviction
### Request Caching
- Cache tool discovery results
- Cache health check results (with TTL)
- Invalidate cache on server changes
### Monitoring
- Track request latency per server
- Track request success/failure rates
- Track circuit breaker state changes
- OpenTelemetry metrics integration
---
## Security
### Input Validation
- Validate all incoming requests
- Sanitize routing context data
- Validate server configuration
### Authentication
- API key authentication
- JWT token support
- Client identity verification
### Authorization
- Role-based access control
- Server-level permissions
- Tool-level permissions
### Rate Limiting
- Per-client rate limiting
- Per-server rate limiting
- Global rate limiting
- Burst protection
---
## Future Enhancements
- [ ] WebSocket transport support
- [ ] Request/response compression
- [ ] Dynamic server registration/discovery
- [ ] A/B testing support
- [ ] Blue/green deployment routing
- [ ] Multi-region routing
- [ ] Request replay for debugging
- [ ] Distributed tracing integration
---
**Document Version:** 1.0.0
**Status:** Planned
**Next Review:** After Phase 1 implementation