Research conducted on modern AI coding assistants (Cursor, GitHub Copilot, Cline,
Aider, Windsurf, Replit Agent) to understand architecture patterns, context management,
code editing workflows, and tool use protocols.
Key Decision: Pivoted from building full CLI (40-50h) to validation-driven MCP-first
approach (10-15h). Build 5 core CODEX MCP tools that work with ANY coding assistant,
validate adoption over 2-4 weeks, then decide on full CLI if demand proven.
Files:
- research/ai-systems/modern-coding-assistants-architecture.md (comprehensive research)
- research/ai-systems/codex-coding-assistant-implementation-plan.md (original CLI plan, preserved)
- research/ai-systems/codex-mcp-tools-implementation-plan.md (approved MCP-first plan)
- ideas/registry.json (updated with approved MCP tools proposal)
Architech Validation: APPROVED with pivot to MCP-first approach
Human Decision: Approved (pragmatic validation-driven development)
Next: Begin Phase 1 implementation (10-15 hours, 5 core MCP tools)
🤖 Generated with CODEX Research System
Co-Authored-By: The Archivist <archivist@codex.svrnty.io>
Co-Authored-By: The Architech <architech@codex.svrnty.io>
Co-Authored-By: Mathias Beaulieu-Duncan <mat@svrnty.io>
534 lines
13 KiB
Markdown
534 lines
13 KiB
Markdown
# OpenHarbor.MCP.Gateway - Module Design
|
|
|
|
**Document Type:** Architecture Design Document
|
|
**Status:** Planned
|
|
**Version:** 1.0.0
|
|
**Last Updated:** 2025-10-19
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
OpenHarbor.MCP.Gateway is a .NET 8 library that provides proxy and routing infrastructure for MCP traffic, enabling centralized management, authentication, monitoring, and load balancing between MCP clients and servers. This document defines the architecture, components, and design decisions.
|
|
|
|
### Purpose
|
|
|
|
- **What**: Gateway/proxy library for routing MCP traffic between clients and servers
|
|
- **Why**: Enable centralized management, security, and monitoring of MCP infrastructure
|
|
- **How**: Clean Architecture with routing strategies, health monitoring, and transport abstraction
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
### Clean Architecture Layers
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────┐
|
|
│ OpenHarbor.MCP.Gateway.Cli (Executable) │
|
|
│ ┌───────────────────────────────────────────┐ │
|
|
│ │ OpenHarbor.MCP.Gateway.AspNetCore (HTTP)│ │
|
|
│ │ ┌─────────────────────────────────────┐ │ │
|
|
│ │ │ OpenHarbor.MCP.Gateway.Infrastructure│ │ │
|
|
│ │ │ ┌───────────────────────────────┐ │ │ │
|
|
│ │ │ │ OpenHarbor.MCP.Gateway.Core │ │ │ │
|
|
│ │ │ │ - IGatewayRouter │ │ │ │
|
|
│ │ │ │ - IRoutingStrategy │ │ │ │
|
|
│ │ │ │ - IAuthProvider │ │ │ │
|
|
│ │ │ │ - ICircuitBreaker │ │ │ │
|
|
│ │ │ │ - Models (no dependencies) │ │ │ │
|
|
│ │ │ └───────────────────────────────┘ │ │ │
|
|
│ │ └─────────────────────────────────────┘ │ │
|
|
│ └───────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Layer Responsibilities
|
|
|
|
| Layer | Purpose | Dependencies |
|
|
|-------|---------|----|
|
|
| **Core** | Abstractions and models | None |
|
|
| **Infrastructure** | Routing, auth, circuit breakers | Core, System.Text.Json |
|
|
| **AspNetCore** | HTTP endpoints and DI | Core, Infrastructure, ASP.NET Core |
|
|
| **Cli** | Management CLI | All layers |
|
|
|
|
---
|
|
|
|
## Core Components
|
|
|
|
### IGatewayRouter Interface
|
|
|
|
Primary interface for gateway routing operations:
|
|
|
|
```csharp
|
|
public interface IGatewayRouter
|
|
{
|
|
// Request Routing
|
|
Task<McpResponse> RouteRequestAsync(
|
|
McpRequest request,
|
|
RoutingContext context,
|
|
CancellationToken ct = default
|
|
);
|
|
|
|
// Server Management
|
|
Task<IEnumerable<ServerInfo>> GetRegisteredServersAsync();
|
|
Task RegisterServerAsync(ServerConfig config);
|
|
Task UnregisterServerAsync(string serverId);
|
|
|
|
// Health Monitoring
|
|
Task<ServerHealthStatus> GetServerHealthAsync(string serverId);
|
|
Task<IEnumerable<ServerHealthStatus>> GetAllServerHealthAsync();
|
|
}
|
|
```
|
|
|
|
### IRoutingStrategy Interface
|
|
|
|
Defines server selection logic:
|
|
|
|
```csharp
|
|
public interface IRoutingStrategy
|
|
{
|
|
string SelectServer(
|
|
RoutingContext context,
|
|
IEnumerable<ServerInfo> availableServers
|
|
);
|
|
}
|
|
|
|
public class RoutingContext
|
|
{
|
|
public string? ToolName { get; set; }
|
|
public string? ClientId { get; set; }
|
|
public Dictionary<string, string>? Headers { get; set; }
|
|
public Dictionary<string, object>? Metadata { get; set; }
|
|
}
|
|
```
|
|
|
|
### IAuthProvider Interface
|
|
|
|
Authentication and authorization:
|
|
|
|
```csharp
|
|
public interface IAuthProvider
|
|
{
|
|
Task<AuthResult> AuthenticateAsync(
|
|
string? apiKey,
|
|
Dictionary<string, string>? headers,
|
|
CancellationToken ct = default
|
|
);
|
|
|
|
Task<bool> AuthorizeAsync(
|
|
string clientId,
|
|
string serverId,
|
|
string toolName,
|
|
CancellationToken ct = default
|
|
);
|
|
}
|
|
|
|
public class AuthResult
|
|
{
|
|
public bool IsAuthenticated { get; set; }
|
|
public string? ClientId { get; set; }
|
|
public IEnumerable<string> Roles { get; set; } = [];
|
|
public string? ErrorMessage { get; set; }
|
|
}
|
|
```
|
|
|
|
### ICircuitBreaker Interface
|
|
|
|
Prevent cascading failures:
|
|
|
|
```csharp
|
|
public interface ICircuitBreaker
|
|
{
|
|
bool IsOpen(string serverId);
|
|
void RecordSuccess(string serverId);
|
|
void RecordFailure(string serverId);
|
|
void Reset(string serverId);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Routing Strategies
|
|
|
|
### Built-In Strategies
|
|
|
|
#### Round-Robin Strategy
|
|
|
|
```csharp
|
|
public class RoundRobinStrategy : IRoutingStrategy
|
|
{
|
|
private int _currentIndex = 0;
|
|
|
|
public string SelectServer(
|
|
RoutingContext context,
|
|
IEnumerable<ServerInfo> servers)
|
|
{
|
|
var healthyServers = servers.Where(s => s.IsHealthy).ToList();
|
|
|
|
if (healthyServers.Count == 0)
|
|
{
|
|
throw new NoHealthyServersException();
|
|
}
|
|
|
|
var index = Interlocked.Increment(ref _currentIndex) % healthyServers.Count;
|
|
return healthyServers[index].Id;
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Tool-Based Strategy
|
|
|
|
```csharp
|
|
public class ToolBasedStrategy : IRoutingStrategy
|
|
{
|
|
private readonly Dictionary<string, string> _toolPrefixMappings;
|
|
|
|
public string SelectServer(
|
|
RoutingContext context,
|
|
IEnumerable<ServerInfo> servers)
|
|
{
|
|
if (context.ToolName == null)
|
|
{
|
|
throw new InvalidOperationException("ToolName required for tool-based routing");
|
|
}
|
|
|
|
foreach (var (prefix, serverId) in _toolPrefixMappings)
|
|
{
|
|
if (context.ToolName.StartsWith(prefix))
|
|
{
|
|
return serverId;
|
|
}
|
|
}
|
|
|
|
// Default to first healthy server
|
|
return servers.First(s => s.IsHealthy).Id;
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Client-Based Strategy
|
|
|
|
```csharp
|
|
public class ClientBasedStrategy : IRoutingStrategy
|
|
{
|
|
private readonly Dictionary<string, string> _clientMappings;
|
|
|
|
public string SelectServer(
|
|
RoutingContext context,
|
|
IEnumerable<ServerInfo> servers)
|
|
{
|
|
if (context.ClientId != null &&
|
|
_clientMappings.TryGetValue(context.ClientId, out var serverId))
|
|
{
|
|
return serverId;
|
|
}
|
|
|
|
// Default routing
|
|
return servers.First(s => s.IsHealthy).Id;
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Gateway Configuration Model
|
|
|
|
```csharp
|
|
public class GatewayConfig
|
|
{
|
|
public string Name { get; set; } = "MCP Gateway";
|
|
public string Version { get; set; } = "1.0.0";
|
|
public string? Description { get; set; }
|
|
public string ListenAddress { get; set; } = "http://localhost:8080";
|
|
}
|
|
|
|
public class ServerConfig
|
|
{
|
|
public string Id { get; set; } = string.Empty;
|
|
public string Name { get; set; } = string.Empty;
|
|
public TransportConfig Transport { get; set; } = new();
|
|
public bool Enabled { get; set; } = true;
|
|
public Dictionary<string, string>? Metadata { get; set; }
|
|
}
|
|
|
|
public class TransportConfig
|
|
{
|
|
public string Type { get; set; } = "Stdio"; // "Stdio" or "Http"
|
|
public string? Command { get; set; }
|
|
public string[]? Args { get; set; }
|
|
public string? BaseUrl { get; set; }
|
|
public Dictionary<string, string>? Headers { get; set; }
|
|
}
|
|
```
|
|
|
|
### Routing Configuration
|
|
|
|
```csharp
|
|
public class RoutingConfig
|
|
{
|
|
public string Strategy { get; set; } = "RoundRobin"; // "RoundRobin", "ToolBased", "ClientBased"
|
|
public TimeSpan HealthCheckInterval { get; set; } = TimeSpan.FromSeconds(30);
|
|
public Dictionary<string, string>? StrategyConfig { get; set; }
|
|
}
|
|
```
|
|
|
|
### Security Configuration
|
|
|
|
```csharp
|
|
public class SecurityConfig
|
|
{
|
|
public bool EnableAuthentication { get; set; } = false;
|
|
public string? ApiKeyHeader { get; set; } = "X-MCP-API-Key";
|
|
public RateLimitConfig? RateLimit { get; set; }
|
|
}
|
|
|
|
public class RateLimitConfig
|
|
{
|
|
public int RequestsPerMinute { get; set; } = 100;
|
|
public int BurstSize { get; set; } = 20;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Health Monitoring
|
|
|
|
### Server Health Check
|
|
|
|
```csharp
|
|
public class ServerHealthStatus
|
|
{
|
|
public string ServerId { get; set; } = string.Empty;
|
|
public string ServerName { get; set; } = string.Empty;
|
|
public bool IsHealthy { get; set; }
|
|
public DateTime LastCheck { get; set; }
|
|
public TimeSpan? ResponseTime { get; set; }
|
|
public string? ErrorMessage { get; set; }
|
|
}
|
|
```
|
|
|
|
### Health Check Implementation
|
|
|
|
```csharp
|
|
public class McpServerHealthCheck : IHealthCheck
|
|
{
|
|
private readonly IGatewayRouter _router;
|
|
|
|
public async Task<HealthCheckResult> CheckHealthAsync(
|
|
HealthCheckContext context,
|
|
CancellationToken ct = default)
|
|
{
|
|
var statuses = await _router.GetAllServerHealthAsync();
|
|
var healthyCount = statuses.Count(s => s.IsHealthy);
|
|
var totalCount = statuses.Count();
|
|
|
|
if (healthyCount == totalCount)
|
|
{
|
|
return HealthCheckResult.Healthy(
|
|
$"All {totalCount} servers healthy");
|
|
}
|
|
else if (healthyCount > 0)
|
|
{
|
|
return HealthCheckResult.Degraded(
|
|
$"{healthyCount}/{totalCount} servers healthy");
|
|
}
|
|
else
|
|
{
|
|
return HealthCheckResult.Unhealthy(
|
|
"No healthy servers available");
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
### Exception Hierarchy
|
|
|
|
```csharp
|
|
public class GatewayException : Exception { }
|
|
|
|
public class NoHealthyServersException : GatewayException { }
|
|
|
|
public class ServerNotFoundException : GatewayException
|
|
{
|
|
public string ServerId { get; }
|
|
}
|
|
|
|
public class RoutingException : GatewayException
|
|
{
|
|
public RoutingContext Context { get; }
|
|
}
|
|
|
|
public class AuthenticationException : GatewayException { }
|
|
|
|
public class RateLimitExceededException : GatewayException
|
|
{
|
|
public string ClientId { get; }
|
|
public int RequestsPerMinute { get; }
|
|
}
|
|
```
|
|
|
|
### Circuit Breaker Implementation
|
|
|
|
```csharp
|
|
public class CircuitBreaker : ICircuitBreaker
|
|
{
|
|
private readonly ConcurrentDictionary<string, CircuitState> _states = new();
|
|
private readonly int _failureThreshold = 5;
|
|
private readonly TimeSpan _timeout = TimeSpan.FromSeconds(30);
|
|
|
|
public bool IsOpen(string serverId)
|
|
{
|
|
if (!_states.TryGetValue(serverId, out var state))
|
|
{
|
|
return false;
|
|
}
|
|
|
|
if (state.State == CircuitState.Open &&
|
|
DateTime.UtcNow - state.LastFailure > _timeout)
|
|
{
|
|
// Transition to half-open
|
|
state.State = CircuitState.HalfOpen;
|
|
}
|
|
|
|
return state.State == CircuitState.Open;
|
|
}
|
|
|
|
public void RecordSuccess(string serverId)
|
|
{
|
|
_states.AddOrUpdate(serverId,
|
|
_ => new CircuitState { State = CircuitState.Closed },
|
|
(_, state) => { state.FailureCount = 0; state.State = CircuitState.Closed; return state; });
|
|
}
|
|
|
|
public void RecordFailure(string serverId)
|
|
{
|
|
_states.AddOrUpdate(serverId,
|
|
_ => new CircuitState { FailureCount = 1, LastFailure = DateTime.UtcNow },
|
|
(_, state) =>
|
|
{
|
|
state.FailureCount++;
|
|
state.LastFailure = DateTime.UtcNow;
|
|
if (state.FailureCount >= _failureThreshold)
|
|
{
|
|
state.State = CircuitState.Open;
|
|
}
|
|
return state;
|
|
});
|
|
}
|
|
|
|
public void Reset(string serverId)
|
|
{
|
|
_states.TryRemove(serverId, out _);
|
|
}
|
|
}
|
|
|
|
enum CircuitState
|
|
{
|
|
Closed,
|
|
Open,
|
|
HalfOpen
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
|
|
- Test Core abstractions with mocks
|
|
- Test routing strategies with mock servers
|
|
- Test circuit breaker logic
|
|
- Test authentication/authorization
|
|
|
|
### Integration Tests
|
|
|
|
- Test actual routing to real MCP servers
|
|
- Test health checks
|
|
- Test error scenarios (server failures, timeouts)
|
|
- Test authentication flows
|
|
|
|
### Test Coverage Goals
|
|
|
|
- Core: >90%
|
|
- Infrastructure: >80%
|
|
- AspNetCore: >70%
|
|
|
|
---
|
|
|
|
## Performance Considerations
|
|
|
|
### Connection Pooling
|
|
|
|
- Maintain persistent connections to backend servers
|
|
- Configurable pool size per server
|
|
- Idle connection eviction
|
|
|
|
### Request Caching
|
|
|
|
- Cache tool discovery results
|
|
- Cache health check results (with TTL)
|
|
- Invalidate cache on server changes
|
|
|
|
### Monitoring
|
|
|
|
- Track request latency per server
|
|
- Track request success/failure rates
|
|
- Track circuit breaker state changes
|
|
- OpenTelemetry metrics integration
|
|
|
|
---
|
|
|
|
## Security
|
|
|
|
### Input Validation
|
|
|
|
- Validate all incoming requests
|
|
- Sanitize routing context data
|
|
- Validate server configuration
|
|
|
|
### Authentication
|
|
|
|
- API key authentication
|
|
- JWT token support
|
|
- Client identity verification
|
|
|
|
### Authorization
|
|
|
|
- Role-based access control
|
|
- Server-level permissions
|
|
- Tool-level permissions
|
|
|
|
### Rate Limiting
|
|
|
|
- Per-client rate limiting
|
|
- Per-server rate limiting
|
|
- Global rate limiting
|
|
- Burst protection
|
|
|
|
---
|
|
|
|
## Future Enhancements
|
|
|
|
- [ ] WebSocket transport support
|
|
- [ ] Request/response compression
|
|
- [ ] Dynamic server registration/discovery
|
|
- [ ] A/B testing support
|
|
- [ ] Blue/green deployment routing
|
|
- [ ] Multi-region routing
|
|
- [ ] Request replay for debugging
|
|
- [ ] Distributed tracing integration
|
|
|
|
---
|
|
|
|
**Document Version:** 1.0.0
|
|
**Status:** Planned
|
|
**Next Review:** After Phase 1 implementation
|