# Apple Intelligence gRPC Server A Swift-based gRPC server that exposes Apple Intelligence (Foundation Models) over the network, allowing any device on your LAN to send prompts and receive streaming AI responses. ## Features - **gRPC API** - Standard gRPC interface accessible from any language - **Streaming Support** - Real-time token streaming for responsive UX - **Vision Analysis** - Analyze images with text extraction, labeling, and descriptions - **Text-to-Speech** - Convert text to audio (WAV/MP3) with multiple voices - **Speech-to-Text** - Transcribe audio files or stream audio in real-time - **Menu Bar App** - Native macOS app with system tray integration - **Built-in Chat UI** - Test the AI directly from the app with voice input/output - **API Key Auth** - Optional bearer token authentication - **Auto-Start** - Launch at login and auto-start server options ## Requirements - macOS 26+ (Tahoe) - Apple Silicon Mac (M1/M2/M3/M4) - Apple Intelligence enabled in System Settings - Swift 6.0+ ## Installation ### Download Release Download the latest `.dmg` from the [Releases](../../releases) page, open it, and drag the app to Applications. ### Build from Source ```bash # Clone the repository git clone https://github.com/svrnty/apple-intelligence-grpc.git cd apple-intelligence-grpc # Build the menu bar app swift build -c release --product AppleIntelligenceApp # Or build the CLI server swift build -c release --product AppleIntelligenceServer ``` ## Usage ### Menu Bar App 1. Launch **Apple Intelligence Server** from Applications 2. Click the brain icon in the menu bar 3. Toggle **Start Server** to begin accepting connections 4. Use **Chat** to test the AI directly (supports voice input/output) 5. Configure host, port, and API key in **Settings** ### CLI Server ```bash # Run with defaults (0.0.0.0:50051) .build/release/AppleIntelligenceServer # Custom configuration via environment GRPC_HOST=127.0.0.1 GRPC_PORT=8080 API_KEY=secret .build/release/AppleIntelligenceServer ``` ## API ### Service Definition ```protobuf service AppleIntelligenceService { // AI Completion rpc Health(HealthRequest) returns (HealthResponse); rpc Complete(CompletionRequest) returns (CompletionResponse); rpc StreamComplete(CompletionRequest) returns (stream CompletionChunk); // Text-to-Speech rpc TextToSpeech(TextToSpeechRequest) returns (TextToSpeechResponse); rpc ListVoices(ListVoicesRequest) returns (ListVoicesResponse); // Speech-to-Text rpc Transcribe(TranscribeRequest) returns (TranscribeResponse); rpc StreamTranscribe(stream StreamingTranscribeRequest) returns (stream StreamingTranscribeResponse); } ``` ### Methods | Method | Type | Description | |--------|------|-------------| | `Health` | Unary | Check server and model availability | | `Complete` | Unary | Generate complete response (supports images) | | `StreamComplete` | Server Streaming | Stream tokens as they're generated | | `TextToSpeech` | Unary | Convert text to audio | | `ListVoices` | Unary | List available TTS voices | | `Transcribe` | Unary | Transcribe audio file to text | | `StreamTranscribe` | Bidirectional | Real-time audio transcription | ### Vision Support The `Complete` and `StreamComplete` methods support image analysis: ```protobuf message CompletionRequest { string prompt = 1; optional float temperature = 2; optional int32 max_tokens = 3; repeated ImageData images = 4; // Attach images for analysis bool include_analysis = 5; // Return detailed analysis } message ImageData { bytes data = 1; string filename = 2; string mime_type = 3; // image/png, image/jpeg, etc. } ``` **Supported Image Formats:** PNG, JPEG, GIF, WebP, HEIC ### Text-to-Speech ```protobuf message TextToSpeechRequest { string text = 1; AudioFormat output_format = 2; // WAV or MP3 optional VoiceConfig voice_config = 3; } message VoiceConfig { string voice_identifier = 1; // Voice ID from ListVoices optional float speaking_rate = 2; // 0.0-1.0, default 0.5 optional float pitch_multiplier = 3; // 0.5-2.0, default 1.0 optional float volume = 4; // 0.0-1.0, default 1.0 } ``` **Output Formats:** WAV, MP3 ### Speech-to-Text #### File-based Transcription ```protobuf message TranscribeRequest { AudioInput audio = 1; optional TranscriptionConfig config = 2; } message AudioInput { bytes data = 1; string mime_type = 2; // audio/wav, audio/mp3, etc. optional int32 sample_rate = 3; optional int32 channels = 4; } message TranscriptionConfig { optional string language_code = 1; // e.g., "en-US", "fr-CA" optional bool enable_punctuation = 2; optional bool enable_timestamps = 3; } ``` **Supported Audio Formats:** WAV, MP3, M4A, AAC, FLAC #### Streaming Transcription For real-time transcription, use bidirectional streaming: 1. Send `TranscriptionConfig` first to configure the session 2. Send `audio_chunk` messages with PCM audio data (16-bit, 16kHz, mono) 3. Receive `StreamingTranscribeResponse` with partial and final results ```protobuf message StreamingTranscribeRequest { oneof request { TranscriptionConfig config = 1; // Send first bytes audio_chunk = 2; // Then audio chunks } } message StreamingTranscribeResponse { string partial_text = 1; bool is_final = 2; string final_text = 3; repeated TranscriptionSegment segments = 4; } ``` ### Quick Test with grpcurl ```bash # Health check grpcurl -plaintext localhost:50051 appleintelligence.AppleIntelligenceService/Health # Text completion grpcurl -plaintext \ -d '{"prompt": "What is 2 + 2?"}' \ localhost:50051 appleintelligence.AppleIntelligenceService/Complete # Streaming completion grpcurl -plaintext \ -d '{"prompt": "Tell me a short story"}' \ localhost:50051 appleintelligence.AppleIntelligenceService/StreamComplete # List TTS voices grpcurl -plaintext \ -d '{"language_code": "en-US"}' \ localhost:50051 appleintelligence.AppleIntelligenceService/ListVoices # Text-to-Speech (base64 encode the response audio_data) grpcurl -plaintext \ -d '{"text": "Hello world", "output_format": 1}' \ localhost:50051 appleintelligence.AppleIntelligenceService/TextToSpeech # Transcribe audio file (base64 encode audio data) grpcurl -plaintext \ -d '{"audio": {"data": "'$(base64 -i audio.wav)'", "mime_type": "audio/wav"}}' \ localhost:50051 appleintelligence.AppleIntelligenceService/Transcribe ``` ## Configuration | Environment Variable | Default | Description | |---------------------|---------|-------------| | `GRPC_HOST` | `0.0.0.0` | Host to bind (use `0.0.0.0` for LAN access) | | `GRPC_PORT` | `50051` | Port to listen on | | `API_KEY` | *none* | Optional API key for authentication | ## Supported Languages ### Speech Recognition (STT) - English (US, CA, GB, AU, IN, IE, ZA) - French (CA, FR) - Spanish (ES, MX) - German, Italian, Portuguese, Japanese, Korean, Chinese - And many more via macOS Speech framework ### Text-to-Speech (TTS) All voices available in macOS System Settings, including: - Premium voices (highest quality, requires download) - Enhanced voices (good quality) - Default/Compact voices (pre-installed) ## Client Libraries Connect from any language with gRPC support: - **Python**: `grpcio`, `grpcio-tools` - **Node.js**: `@grpc/grpc-js`, `@grpc/proto-loader` - **Go**: `google.golang.org/grpc` - **Swift**: `grpc-swift` - **Rust**: `tonic` See [docs/grpc-client-guide.md](docs/grpc-client-guide.md) for detailed examples. ## Project Structure ``` apple-intelligence-grpc/ ├── Package.swift ├── Proto/ │ └── apple_intelligence.proto # gRPC service definition ├── Sources/ │ ├── AppleIntelligenceCore/ # Shared gRPC service code │ │ ├── Config.swift │ │ ├── Services/ │ │ │ ├── AppleIntelligenceService.swift │ │ │ ├── TextToSpeechService.swift │ │ │ ├── SpeechToTextService.swift │ │ │ └── VisionAnalysisService.swift │ │ ├── Providers/ │ │ │ └── AppleIntelligenceProvider.swift │ │ └── Generated/ │ │ ├── apple_intelligence.pb.swift │ │ └── apple_intelligence.grpc.swift │ ├── AppleIntelligenceServer/ # CLI executable │ │ └── main.swift │ └── AppleIntelligenceApp/ # Menu bar app │ ├── App.swift │ ├── ServerManager.swift │ ├── Models/ │ ├── Views/ │ └── ViewModels/ ├── scripts/ │ ├── build-app.sh # Build .app bundle │ └── create-dmg.sh # Create DMG installer └── docs/ ├── grpc-client-guide.md # Client connection examples ├── macos-runner-setup.md # CI runner setup └── pipeline-configuration.md # CI/CD configuration ``` ## CI/CD Automated builds are configured with Gitea Actions. When a release is created: 1. Builds the app bundle 2. Signs with Developer ID 3. Notarizes with Apple 4. Uploads DMG to release See [docs/pipeline-configuration.md](docs/pipeline-configuration.md) for setup instructions. ## Security - **Local Network**: By default, the server binds to `0.0.0.0` allowing LAN access - **API Key**: Enable authentication by setting the `API_KEY` environment variable - **Firewall**: macOS will prompt to allow incoming connections on first run - **Notarized**: Release builds are signed and notarized by Apple ## Troubleshooting ### Model Not Available - Ensure Apple Intelligence is enabled: System Settings → Apple Intelligence & Siri - Requires Apple Silicon Mac with macOS 26+ ### Connection Refused - Check the server is running (brain icon should be filled) - Verify firewall allows connections on the configured port - Try `localhost` instead of the IP if testing locally ### Authentication Failed - Include the API key in the Authorization header: `Bearer YOUR_API_KEY` - Verify the key matches what's configured in Settings ### Speech Recognition Not Working - Grant microphone permission when prompted - Check System Settings → Privacy & Security → Speech Recognition - Ensure the language is supported ### TTS Voice Quality - Download Premium/Enhanced voices from System Settings → Accessibility → Read & Speak - Premium voices are larger (~150-500MB) but sound more natural ## License MIT