# Claude Vision Auto Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model. ## Overview Claude Vision Auto automatically detects and responds to approval prompts in Claude Code by: 1. Monitoring terminal output for approval keywords 2. Taking screenshots when idle (waiting for input) 3. Analyzing screenshots with MiniCPM-V vision model via Ollama 4. Automatically submitting appropriate responses ## Features - **Zero Pattern Matching**: Uses vision AI instead of fragile regex patterns - **Universal Compatibility**: Works with any Claude Code prompt format - **Intelligent Detection**: Only activates when approval keywords are present - **Configurable**: Environment variables for all settings - **Lightweight**: Minimal dependencies (only `requests`) - **Debug Mode**: Verbose logging for troubleshooting ## Prerequisites ### Required 1. **Claude Code CLI** - Anthropic's official CLI tool ```bash npm install -g @anthropic-ai/claude-code ``` 2. **Ollama with MiniCPM-V** - Vision model server ```bash docker pull ollama/ollama docker run -d -p 11434:11434 --name ollama ollama/ollama docker exec ollama ollama pull minicpm-v:latest ``` 3. **Screenshot Tool** (one of): - `scrot` (recommended) - `gnome-screenshot` - `imagemagick` (import command) - `maim` ### Install Screenshot Tool ```bash # Debian/Ubuntu (recommended) sudo apt-get install scrot # Alternative options sudo apt-get install gnome-screenshot sudo apt-get install imagemagick sudo apt-get install maim ``` ## Installation ### Quick Install ```bash cd claude-vision-auto make deps # Install system dependencies make install # Install the package ``` ### Manual Install ```bash # Install system dependencies sudo apt-get update sudo apt-get install -y scrot python3-pip # Install Python package pip3 install -e . ``` ### Verify Installation ```bash # Check if command is available which claude-vision # Test Ollama connection curl http://localhost:11434/api/tags ``` ## Usage ### Basic Usage Replace `claude` with `claude-vision`: ```bash # Instead of: claude # Use: claude-vision ``` ### With Prompts ```bash # Pass prompts directly claude-vision "create a test file in /tmp" # Interactive session claude-vision ``` ### Configuration Set environment variables to customize behavior: ```bash # Ollama URL (default: http://localhost:11434/api/generate) export OLLAMA_URL="http://custom-host:11434/api/generate" # Vision model (default: minicpm-v:latest) export VISION_MODEL="llama3.2-vision:latest" # Idle threshold in seconds (default: 3.0) export IDLE_THRESHOLD="5.0" # Response delay in seconds (default: 1.0) export RESPONSE_DELAY="2.0" # Enable debug mode export DEBUG="true" # Run with custom settings claude-vision ``` ## How It Works 1. **Launch**: Spawns `claude` as subprocess 2. **Monitor**: Watches output for approval keywords (Yes, No, Approve, etc.) 3. **Detect Idle**: When output stops for `IDLE_THRESHOLD` seconds 4. **Screenshot**: Captures terminal window with `scrot` 5. **Analyze**: Sends to MiniCPM-V via Ollama API 6. **Respond**: Vision model returns "1", "y", or "WAIT" 7. **Submit**: Automatically sends response if not "WAIT" ## Configuration Options | Variable | Default | Description | |----------|---------|-------------| | `OLLAMA_URL` | `http://localhost:11434/api/generate` | Ollama API endpoint | | `VISION_MODEL` | `minicpm-v:latest` | Vision model to use | | `IDLE_THRESHOLD` | `3.0` | Seconds of idle before screenshot | | `RESPONSE_DELAY` | `1.0` | Seconds to wait before responding | | `OUTPUT_BUFFER_SIZE` | `4096` | Bytes of output to buffer | | `SCREENSHOT_TIMEOUT` | `5` | Screenshot capture timeout | | `VISION_TIMEOUT` | `30` | Vision analysis timeout | | `DEBUG` | `false` | Enable verbose logging | ## Troubleshooting ### Ollama Not Connected ```bash # Check if Ollama is running docker ps | grep ollama # Check if model is available curl http://localhost:11434/api/tags ``` ### Screenshot Fails ```bash # Test screenshot tool scrot /tmp/test.png ls -lh /tmp/test.png # Install if missing sudo apt-get install scrot ``` ### Debug Mode ```bash # Run with verbose logging DEBUG=true claude-vision "test command" ``` ### Vision Model Not Found ```bash # Pull the model docker exec ollama ollama pull minicpm-v:latest # Or use alternative vision model export VISION_MODEL="llava:latest" claude-vision ``` ## Development ### Setup Development Environment ```bash # Clone repository git clone https://git.openharbor.io/svrnty/claude-vision-auto.git cd claude-vision-auto # Install in development mode pip3 install -e . # Run tests make test ``` ### Project Structure ``` claude-vision-auto/ ├── README.md # This file ├── LICENSE # MIT License ├── setup.py # Package setup ├── requirements.txt # Python dependencies ├── Makefile # Build automation ├── .gitignore # Git ignore rules ├── claude_vision_auto/ # Main package │ ├── __init__.py # Package initialization │ ├── main.py # CLI entry point │ ├── config.py # Configuration │ ├── screenshot.py # Screenshot capture │ └── vision_analyzer.py # Vision analysis ├── bin/ │ └── claude-vision # CLI wrapper script ├── tests/ # Test suite │ └── test_vision.py ├── docs/ # Documentation │ ├── INSTALLATION.md │ └── USAGE.md └── examples/ # Usage examples └── example_usage.sh ``` ## Supported Vision Models Tested and working: - **minicpm-v:latest** (Recommended) - Best for structured output - **llama3.2-vision:latest** - Good alternative - **llava:latest** - Fallback option ## Performance - **Startup**: < 1 second - **Screenshot**: ~100ms - **Vision Analysis**: 2-5 seconds (depends on model) - **Total Response Time**: 3-7 seconds per approval ## Limitations - **X11 Only**: Requires X11 display server (no Wayland support for scrot) - **Linux Only**: Currently only tested on Debian/Ubuntu - **Vision Dependency**: Requires Ollama and vision model - **Screen Required**: Must have GUI session (no headless support) ## Future Enhancements - [ ] Wayland support (alternative screenshot tools) - [ ] macOS support - [ ] Headless mode (API-only, no screenshots) - [ ] Configurable response patterns - [ ] Multi-terminal support - [ ] Session recording and replay ## License MIT License - See LICENSE file ## Author **Svrnty** - Email: jp@svrnty.io - Repository: https://git.openharbor.io/svrnty/claude-vision-auto ## Contributing Contributions welcome! Please: 1. Fork the repository 2. Create a feature branch 3. Submit a pull request ## Acknowledgments - **Anthropic** - Claude Code CLI - **MiniCPM-V** - Vision model - **Ollama** - Model serving infrastructure