- Document all gRPC API methods including new speech services - Add Vision support section with image formats - Add Text-to-Speech section with voice configuration - Add Speech-to-Text section with file and streaming support - Document supported audio formats (WAV, MP3, M4A, AAC, FLAC) - Add streaming transcription protocol details - Update grpcurl examples for all endpoints - Add supported languages section - Update project structure with new services - Add troubleshooting for speech features 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .gitea/workflows | ||
| docs | ||
| Proto | ||
| scripts | ||
| Sources | ||
| .gitignore | ||
| Package.resolved | ||
| Package.swift | ||
| README.md | ||
Apple Intelligence gRPC Server
A Swift-based gRPC server that exposes Apple Intelligence (Foundation Models) over the network, allowing any device on your LAN to send prompts and receive streaming AI responses.
Features
- gRPC API - Standard gRPC interface accessible from any language
- Streaming Support - Real-time token streaming for responsive UX
- Vision Analysis - Analyze images with text extraction, labeling, and descriptions
- Text-to-Speech - Convert text to audio (WAV/MP3) with multiple voices
- Speech-to-Text - Transcribe audio files or stream audio in real-time
- Menu Bar App - Native macOS app with system tray integration
- Built-in Chat UI - Test the AI directly from the app with voice input/output
- API Key Auth - Optional bearer token authentication
- Auto-Start - Launch at login and auto-start server options
Requirements
- macOS 26+ (Tahoe)
- Apple Silicon Mac (M1/M2/M3/M4)
- Apple Intelligence enabled in System Settings
- Swift 6.0+
Installation
Download Release
Download the latest .dmg from the Releases page, open it, and drag the app to Applications.
Build from Source
# Clone the repository
git clone https://github.com/svrnty/apple-intelligence-grpc.git
cd apple-intelligence-grpc
# Build the menu bar app
swift build -c release --product AppleIntelligenceApp
# Or build the CLI server
swift build -c release --product AppleIntelligenceServer
Usage
Menu Bar App
- Launch Apple Intelligence Server from Applications
- Click the brain icon in the menu bar
- Toggle Start Server to begin accepting connections
- Use Chat to test the AI directly (supports voice input/output)
- Configure host, port, and API key in Settings
CLI Server
# Run with defaults (0.0.0.0:50051)
.build/release/AppleIntelligenceServer
# Custom configuration via environment
GRPC_HOST=127.0.0.1 GRPC_PORT=8080 API_KEY=secret .build/release/AppleIntelligenceServer
API
Service Definition
service AppleIntelligenceService {
// AI Completion
rpc Health(HealthRequest) returns (HealthResponse);
rpc Complete(CompletionRequest) returns (CompletionResponse);
rpc StreamComplete(CompletionRequest) returns (stream CompletionChunk);
// Text-to-Speech
rpc TextToSpeech(TextToSpeechRequest) returns (TextToSpeechResponse);
rpc ListVoices(ListVoicesRequest) returns (ListVoicesResponse);
// Speech-to-Text
rpc Transcribe(TranscribeRequest) returns (TranscribeResponse);
rpc StreamTranscribe(stream StreamingTranscribeRequest) returns (stream StreamingTranscribeResponse);
}
Methods
| Method | Type | Description |
|---|---|---|
Health |
Unary | Check server and model availability |
Complete |
Unary | Generate complete response (supports images) |
StreamComplete |
Server Streaming | Stream tokens as they're generated |
TextToSpeech |
Unary | Convert text to audio |
ListVoices |
Unary | List available TTS voices |
Transcribe |
Unary | Transcribe audio file to text |
StreamTranscribe |
Bidirectional | Real-time audio transcription |
Vision Support
The Complete and StreamComplete methods support image analysis:
message CompletionRequest {
string prompt = 1;
optional float temperature = 2;
optional int32 max_tokens = 3;
repeated ImageData images = 4; // Attach images for analysis
bool include_analysis = 5; // Return detailed analysis
}
message ImageData {
bytes data = 1;
string filename = 2;
string mime_type = 3; // image/png, image/jpeg, etc.
}
Supported Image Formats: PNG, JPEG, GIF, WebP, HEIC
Text-to-Speech
message TextToSpeechRequest {
string text = 1;
AudioFormat output_format = 2; // WAV or MP3
optional VoiceConfig voice_config = 3;
}
message VoiceConfig {
string voice_identifier = 1; // Voice ID from ListVoices
optional float speaking_rate = 2; // 0.0-1.0, default 0.5
optional float pitch_multiplier = 3; // 0.5-2.0, default 1.0
optional float volume = 4; // 0.0-1.0, default 1.0
}
Output Formats: WAV, MP3
Speech-to-Text
File-based Transcription
message TranscribeRequest {
AudioInput audio = 1;
optional TranscriptionConfig config = 2;
}
message AudioInput {
bytes data = 1;
string mime_type = 2; // audio/wav, audio/mp3, etc.
optional int32 sample_rate = 3;
optional int32 channels = 4;
}
message TranscriptionConfig {
optional string language_code = 1; // e.g., "en-US", "fr-CA"
optional bool enable_punctuation = 2;
optional bool enable_timestamps = 3;
}
Supported Audio Formats: WAV, MP3, M4A, AAC, FLAC
Streaming Transcription
For real-time transcription, use bidirectional streaming:
- Send
TranscriptionConfigfirst to configure the session - Send
audio_chunkmessages with PCM audio data (16-bit, 16kHz, mono) - Receive
StreamingTranscribeResponsewith partial and final results
message StreamingTranscribeRequest {
oneof request {
TranscriptionConfig config = 1; // Send first
bytes audio_chunk = 2; // Then audio chunks
}
}
message StreamingTranscribeResponse {
string partial_text = 1;
bool is_final = 2;
string final_text = 3;
repeated TranscriptionSegment segments = 4;
}
Quick Test with grpcurl
# Health check
grpcurl -plaintext localhost:50051 appleintelligence.AppleIntelligenceService/Health
# Text completion
grpcurl -plaintext \
-d '{"prompt": "What is 2 + 2?"}' \
localhost:50051 appleintelligence.AppleIntelligenceService/Complete
# Streaming completion
grpcurl -plaintext \
-d '{"prompt": "Tell me a short story"}' \
localhost:50051 appleintelligence.AppleIntelligenceService/StreamComplete
# List TTS voices
grpcurl -plaintext \
-d '{"language_code": "en-US"}' \
localhost:50051 appleintelligence.AppleIntelligenceService/ListVoices
# Text-to-Speech (base64 encode the response audio_data)
grpcurl -plaintext \
-d '{"text": "Hello world", "output_format": 1}' \
localhost:50051 appleintelligence.AppleIntelligenceService/TextToSpeech
# Transcribe audio file (base64 encode audio data)
grpcurl -plaintext \
-d '{"audio": {"data": "'$(base64 -i audio.wav)'", "mime_type": "audio/wav"}}' \
localhost:50051 appleintelligence.AppleIntelligenceService/Transcribe
Configuration
| Environment Variable | Default | Description |
|---|---|---|
GRPC_HOST |
0.0.0.0 |
Host to bind (use 0.0.0.0 for LAN access) |
GRPC_PORT |
50051 |
Port to listen on |
API_KEY |
none | Optional API key for authentication |
Supported Languages
Speech Recognition (STT)
- English (US, CA, GB, AU, IN, IE, ZA)
- French (CA, FR)
- Spanish (ES, MX)
- German, Italian, Portuguese, Japanese, Korean, Chinese
- And many more via macOS Speech framework
Text-to-Speech (TTS)
All voices available in macOS System Settings, including:
- Premium voices (highest quality, requires download)
- Enhanced voices (good quality)
- Default/Compact voices (pre-installed)
Client Libraries
Connect from any language with gRPC support:
- Python:
grpcio,grpcio-tools - Node.js:
@grpc/grpc-js,@grpc/proto-loader - Go:
google.golang.org/grpc - Swift:
grpc-swift - Rust:
tonic
See docs/grpc-client-guide.md for detailed examples.
Project Structure
apple-intelligence-grpc/
├── Package.swift
├── Proto/
│ └── apple_intelligence.proto # gRPC service definition
├── Sources/
│ ├── AppleIntelligenceCore/ # Shared gRPC service code
│ │ ├── Config.swift
│ │ ├── Services/
│ │ │ ├── AppleIntelligenceService.swift
│ │ │ ├── TextToSpeechService.swift
│ │ │ ├── SpeechToTextService.swift
│ │ │ └── VisionAnalysisService.swift
│ │ ├── Providers/
│ │ │ └── AppleIntelligenceProvider.swift
│ │ └── Generated/
│ │ ├── apple_intelligence.pb.swift
│ │ └── apple_intelligence.grpc.swift
│ ├── AppleIntelligenceServer/ # CLI executable
│ │ └── main.swift
│ └── AppleIntelligenceApp/ # Menu bar app
│ ├── App.swift
│ ├── ServerManager.swift
│ ├── Models/
│ ├── Views/
│ └── ViewModels/
├── scripts/
│ ├── build-app.sh # Build .app bundle
│ └── create-dmg.sh # Create DMG installer
└── docs/
├── grpc-client-guide.md # Client connection examples
├── macos-runner-setup.md # CI runner setup
└── pipeline-configuration.md # CI/CD configuration
CI/CD
Automated builds are configured with Gitea Actions. When a release is created:
- Builds the app bundle
- Signs with Developer ID
- Notarizes with Apple
- Uploads DMG to release
See docs/pipeline-configuration.md for setup instructions.
Security
- Local Network: By default, the server binds to
0.0.0.0allowing LAN access - API Key: Enable authentication by setting the
API_KEYenvironment variable - Firewall: macOS will prompt to allow incoming connections on first run
- Notarized: Release builds are signed and notarized by Apple
Troubleshooting
Model Not Available
- Ensure Apple Intelligence is enabled: System Settings → Apple Intelligence & Siri
- Requires Apple Silicon Mac with macOS 26+
Connection Refused
- Check the server is running (brain icon should be filled)
- Verify firewall allows connections on the configured port
- Try
localhostinstead of the IP if testing locally
Authentication Failed
- Include the API key in the Authorization header:
Bearer YOUR_API_KEY - Verify the key matches what's configured in Settings
Speech Recognition Not Working
- Grant microphone permission when prompted
- Check System Settings → Privacy & Security → Speech Recognition
- Ensure the language is supported
TTS Voice Quality
- Download Premium/Enhanced voices from System Settings → Accessibility → Read & Speak
- Premium voices are larger (~150-500MB) but sound more natural
License
MIT