Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model. Features: - Automatic detection and response to approval prompts - Screenshot capture and vision analysis via Ollama - Support for multiple screenshot tools (scrot, gnome-screenshot, etc.) - Configurable timing and behavior - Debug mode for troubleshooting - Comprehensive documentation Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>
295 lines
6.9 KiB
Markdown
295 lines
6.9 KiB
Markdown
# Claude Vision Auto
|
|
|
|
Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model.
|
|
|
|
## Overview
|
|
|
|
Claude Vision Auto automatically detects and responds to approval prompts in Claude Code by:
|
|
1. Monitoring terminal output for approval keywords
|
|
2. Taking screenshots when idle (waiting for input)
|
|
3. Analyzing screenshots with MiniCPM-V vision model via Ollama
|
|
4. Automatically submitting appropriate responses
|
|
|
|
## Features
|
|
|
|
- **Zero Pattern Matching**: Uses vision AI instead of fragile regex patterns
|
|
- **Universal Compatibility**: Works with any Claude Code prompt format
|
|
- **Intelligent Detection**: Only activates when approval keywords are present
|
|
- **Configurable**: Environment variables for all settings
|
|
- **Lightweight**: Minimal dependencies (only `requests`)
|
|
- **Debug Mode**: Verbose logging for troubleshooting
|
|
|
|
## Prerequisites
|
|
|
|
### Required
|
|
|
|
1. **Claude Code CLI** - Anthropic's official CLI tool
|
|
```bash
|
|
npm install -g @anthropic-ai/claude-code
|
|
```
|
|
|
|
2. **Ollama with MiniCPM-V** - Vision model server
|
|
```bash
|
|
docker pull ollama/ollama
|
|
docker run -d -p 11434:11434 --name ollama ollama/ollama
|
|
docker exec ollama ollama pull minicpm-v:latest
|
|
```
|
|
|
|
3. **Screenshot Tool** (one of):
|
|
- `scrot` (recommended)
|
|
- `gnome-screenshot`
|
|
- `imagemagick` (import command)
|
|
- `maim`
|
|
|
|
### Install Screenshot Tool
|
|
|
|
```bash
|
|
# Debian/Ubuntu (recommended)
|
|
sudo apt-get install scrot
|
|
|
|
# Alternative options
|
|
sudo apt-get install gnome-screenshot
|
|
sudo apt-get install imagemagick
|
|
sudo apt-get install maim
|
|
```
|
|
|
|
## Installation
|
|
|
|
### Quick Install
|
|
|
|
```bash
|
|
cd claude-vision-auto
|
|
make deps # Install system dependencies
|
|
make install # Install the package
|
|
```
|
|
|
|
### Manual Install
|
|
|
|
```bash
|
|
# Install system dependencies
|
|
sudo apt-get update
|
|
sudo apt-get install -y scrot python3-pip
|
|
|
|
# Install Python package
|
|
pip3 install -e .
|
|
```
|
|
|
|
### Verify Installation
|
|
|
|
```bash
|
|
# Check if command is available
|
|
which claude-vision
|
|
|
|
# Test Ollama connection
|
|
curl http://localhost:11434/api/tags
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Basic Usage
|
|
|
|
Replace `claude` with `claude-vision`:
|
|
|
|
```bash
|
|
# Instead of:
|
|
claude
|
|
|
|
# Use:
|
|
claude-vision
|
|
```
|
|
|
|
### With Prompts
|
|
|
|
```bash
|
|
# Pass prompts directly
|
|
claude-vision "create a test file in /tmp"
|
|
|
|
# Interactive session
|
|
claude-vision
|
|
```
|
|
|
|
### Configuration
|
|
|
|
Set environment variables to customize behavior:
|
|
|
|
```bash
|
|
# Ollama URL (default: http://localhost:11434/api/generate)
|
|
export OLLAMA_URL="http://custom-host:11434/api/generate"
|
|
|
|
# Vision model (default: minicpm-v:latest)
|
|
export VISION_MODEL="llama3.2-vision:latest"
|
|
|
|
# Idle threshold in seconds (default: 3.0)
|
|
export IDLE_THRESHOLD="5.0"
|
|
|
|
# Response delay in seconds (default: 1.0)
|
|
export RESPONSE_DELAY="2.0"
|
|
|
|
# Enable debug mode
|
|
export DEBUG="true"
|
|
|
|
# Run with custom settings
|
|
claude-vision
|
|
```
|
|
|
|
## How It Works
|
|
|
|
1. **Launch**: Spawns `claude` as subprocess
|
|
2. **Monitor**: Watches output for approval keywords (Yes, No, Approve, etc.)
|
|
3. **Detect Idle**: When output stops for `IDLE_THRESHOLD` seconds
|
|
4. **Screenshot**: Captures terminal window with `scrot`
|
|
5. **Analyze**: Sends to MiniCPM-V via Ollama API
|
|
6. **Respond**: Vision model returns "1", "y", or "WAIT"
|
|
7. **Submit**: Automatically sends response if not "WAIT"
|
|
|
|
## Configuration Options
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `OLLAMA_URL` | `http://localhost:11434/api/generate` | Ollama API endpoint |
|
|
| `VISION_MODEL` | `minicpm-v:latest` | Vision model to use |
|
|
| `IDLE_THRESHOLD` | `3.0` | Seconds of idle before screenshot |
|
|
| `RESPONSE_DELAY` | `1.0` | Seconds to wait before responding |
|
|
| `OUTPUT_BUFFER_SIZE` | `4096` | Bytes of output to buffer |
|
|
| `SCREENSHOT_TIMEOUT` | `5` | Screenshot capture timeout |
|
|
| `VISION_TIMEOUT` | `30` | Vision analysis timeout |
|
|
| `DEBUG` | `false` | Enable verbose logging |
|
|
|
|
## Troubleshooting
|
|
|
|
### Ollama Not Connected
|
|
|
|
```bash
|
|
# Check if Ollama is running
|
|
docker ps | grep ollama
|
|
|
|
# Check if model is available
|
|
curl http://localhost:11434/api/tags
|
|
```
|
|
|
|
### Screenshot Fails
|
|
|
|
```bash
|
|
# Test screenshot tool
|
|
scrot /tmp/test.png
|
|
ls -lh /tmp/test.png
|
|
|
|
# Install if missing
|
|
sudo apt-get install scrot
|
|
```
|
|
|
|
### Debug Mode
|
|
|
|
```bash
|
|
# Run with verbose logging
|
|
DEBUG=true claude-vision "test command"
|
|
```
|
|
|
|
### Vision Model Not Found
|
|
|
|
```bash
|
|
# Pull the model
|
|
docker exec ollama ollama pull minicpm-v:latest
|
|
|
|
# Or use alternative vision model
|
|
export VISION_MODEL="llava:latest"
|
|
claude-vision
|
|
```
|
|
|
|
## Development
|
|
|
|
### Setup Development Environment
|
|
|
|
```bash
|
|
# Clone repository
|
|
git clone https://git.openharbor.io/svrnty/claude-vision-auto.git
|
|
cd claude-vision-auto
|
|
|
|
# Install in development mode
|
|
pip3 install -e .
|
|
|
|
# Run tests
|
|
make test
|
|
```
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
claude-vision-auto/
|
|
├── README.md # This file
|
|
├── LICENSE # MIT License
|
|
├── setup.py # Package setup
|
|
├── requirements.txt # Python dependencies
|
|
├── Makefile # Build automation
|
|
├── .gitignore # Git ignore rules
|
|
├── claude_vision_auto/ # Main package
|
|
│ ├── __init__.py # Package initialization
|
|
│ ├── main.py # CLI entry point
|
|
│ ├── config.py # Configuration
|
|
│ ├── screenshot.py # Screenshot capture
|
|
│ └── vision_analyzer.py # Vision analysis
|
|
├── bin/
|
|
│ └── claude-vision # CLI wrapper script
|
|
├── tests/ # Test suite
|
|
│ └── test_vision.py
|
|
├── docs/ # Documentation
|
|
│ ├── INSTALLATION.md
|
|
│ └── USAGE.md
|
|
└── examples/ # Usage examples
|
|
└── example_usage.sh
|
|
```
|
|
|
|
## Supported Vision Models
|
|
|
|
Tested and working:
|
|
|
|
- **minicpm-v:latest** (Recommended) - Best for structured output
|
|
- **llama3.2-vision:latest** - Good alternative
|
|
- **llava:latest** - Fallback option
|
|
|
|
## Performance
|
|
|
|
- **Startup**: < 1 second
|
|
- **Screenshot**: ~100ms
|
|
- **Vision Analysis**: 2-5 seconds (depends on model)
|
|
- **Total Response Time**: 3-7 seconds per approval
|
|
|
|
## Limitations
|
|
|
|
- **X11 Only**: Requires X11 display server (no Wayland support for scrot)
|
|
- **Linux Only**: Currently only tested on Debian/Ubuntu
|
|
- **Vision Dependency**: Requires Ollama and vision model
|
|
- **Screen Required**: Must have GUI session (no headless support)
|
|
|
|
## Future Enhancements
|
|
|
|
- [ ] Wayland support (alternative screenshot tools)
|
|
- [ ] macOS support
|
|
- [ ] Headless mode (API-only, no screenshots)
|
|
- [ ] Configurable response patterns
|
|
- [ ] Multi-terminal support
|
|
- [ ] Session recording and replay
|
|
|
|
## License
|
|
|
|
MIT License - See LICENSE file
|
|
|
|
## Author
|
|
|
|
**Svrnty**
|
|
- Email: jp@svrnty.io
|
|
- Repository: https://git.openharbor.io/svrnty/claude-vision-auto
|
|
|
|
## Contributing
|
|
|
|
Contributions welcome! Please:
|
|
1. Fork the repository
|
|
2. Create a feature branch
|
|
3. Submit a pull request
|
|
|
|
## Acknowledgments
|
|
|
|
- **Anthropic** - Claude Code CLI
|
|
- **MiniCPM-V** - Vision model
|
|
- **Ollama** - Model serving infrastructure
|