- Use full screen capture instead of active window (more reliable) - Explicitly preserve and set DISPLAY environment variable - Default to :0 if DISPLAY not set - Better error logging with stderr output - Use time.time() instead of date command for timestamp Fixes screenshot failures when DISPLAY not inherited from parent process. Changes from window-specific capture to full screen for better reliability. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io> |
||
|---|---|---|
| bin | ||
| claude_vision_auto | ||
| docs | ||
| examples | ||
| tests | ||
| .gitignore | ||
| CHANGELOG.md | ||
| LICENSE | ||
| Makefile | ||
| MANIFEST.in | ||
| QUICKSTART.md | ||
| README.md | ||
| requirements-dev.txt | ||
| requirements.txt | ||
| setup.py | ||
Claude Vision Auto
Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model.
Overview
Claude Vision Auto automatically detects and responds to approval prompts in Claude Code by:
- Monitoring terminal output for approval keywords
- Taking screenshots when idle (waiting for input)
- Analyzing screenshots with MiniCPM-V vision model via Ollama
- Automatically submitting appropriate responses
Features
- Zero Pattern Matching: Uses vision AI instead of fragile regex patterns
- Universal Compatibility: Works with any Claude Code prompt format
- Intelligent Detection: Only activates when approval keywords are present
- Configurable: Environment variables for all settings
- Lightweight: Minimal dependencies (only
requests) - Debug Mode: Verbose logging for troubleshooting
Prerequisites
Required
-
Claude Code CLI - Anthropic's official CLI tool
npm install -g @anthropic-ai/claude-code -
Ollama with MiniCPM-V - Vision model server
docker pull ollama/ollama docker run -d -p 11434:11434 --name ollama ollama/ollama docker exec ollama ollama pull minicpm-v:latest -
Screenshot Tool (one of):
scrot(recommended)gnome-screenshotimagemagick(import command)maim
Install Screenshot Tool
# Debian/Ubuntu (recommended)
sudo apt-get install scrot
# Alternative options
sudo apt-get install gnome-screenshot
sudo apt-get install imagemagick
sudo apt-get install maim
Installation
Quick Install
cd claude-vision-auto
make deps # Install system dependencies
make install # Install the package
Manual Install
# Install system dependencies
sudo apt-get update
sudo apt-get install -y scrot python3-pip
# Install Python package
pip3 install -e .
Verify Installation
# Check if command is available
which claude-vision
# Test Ollama connection
curl http://localhost:11434/api/tags
Usage
Basic Usage
Replace claude with claude-vision:
# Instead of:
claude
# Use:
claude-vision
With Prompts
# Pass prompts directly
claude-vision "create a test file in /tmp"
# Interactive session
claude-vision
Configuration
Set environment variables to customize behavior:
# Ollama URL (default: http://localhost:11434/api/generate)
export OLLAMA_URL="http://custom-host:11434/api/generate"
# Vision model (default: minicpm-v:latest)
export VISION_MODEL="llama3.2-vision:latest"
# Idle threshold in seconds (default: 3.0)
export IDLE_THRESHOLD="5.0"
# Response delay in seconds (default: 1.0)
export RESPONSE_DELAY="2.0"
# Enable debug mode
export DEBUG="true"
# Run with custom settings
claude-vision
How It Works
- Launch: Spawns
claudeas subprocess - Monitor: Watches output for approval keywords (Yes, No, Approve, etc.)
- Detect Idle: When output stops for
IDLE_THRESHOLDseconds - Screenshot: Captures terminal window with
scrot - Analyze: Sends to MiniCPM-V via Ollama API
- Respond: Vision model returns "1", "y", or "WAIT"
- Submit: Automatically sends response if not "WAIT"
Configuration Options
| Variable | Default | Description |
|---|---|---|
OLLAMA_URL |
http://localhost:11434/api/generate |
Ollama API endpoint |
VISION_MODEL |
minicpm-v:latest |
Vision model to use |
IDLE_THRESHOLD |
3.0 |
Seconds of idle before screenshot |
RESPONSE_DELAY |
1.0 |
Seconds to wait before responding |
OUTPUT_BUFFER_SIZE |
4096 |
Bytes of output to buffer |
SCREENSHOT_TIMEOUT |
5 |
Screenshot capture timeout |
VISION_TIMEOUT |
30 |
Vision analysis timeout |
DEBUG |
false |
Enable verbose logging |
Troubleshooting
Ollama Not Connected
# Check if Ollama is running
docker ps | grep ollama
# Check if model is available
curl http://localhost:11434/api/tags
Screenshot Fails
# Test screenshot tool
scrot /tmp/test.png
ls -lh /tmp/test.png
# Install if missing
sudo apt-get install scrot
Debug Mode
# Run with verbose logging
DEBUG=true claude-vision "test command"
Vision Model Not Found
# Pull the model
docker exec ollama ollama pull minicpm-v:latest
# Or use alternative vision model
export VISION_MODEL="llava:latest"
claude-vision
Development
Setup Development Environment
# Clone repository
git clone https://git.openharbor.io/svrnty/claude-vision-auto.git
cd claude-vision-auto
# Install in development mode
pip3 install -e .
# Run tests
make test
Project Structure
claude-vision-auto/
├── README.md # This file
├── LICENSE # MIT License
├── setup.py # Package setup
├── requirements.txt # Python dependencies
├── Makefile # Build automation
├── .gitignore # Git ignore rules
├── claude_vision_auto/ # Main package
│ ├── __init__.py # Package initialization
│ ├── main.py # CLI entry point
│ ├── config.py # Configuration
│ ├── screenshot.py # Screenshot capture
│ └── vision_analyzer.py # Vision analysis
├── bin/
│ └── claude-vision # CLI wrapper script
├── tests/ # Test suite
│ └── test_vision.py
├── docs/ # Documentation
│ ├── INSTALLATION.md
│ └── USAGE.md
└── examples/ # Usage examples
└── example_usage.sh
Supported Vision Models
Tested and working:
- minicpm-v:latest (Recommended) - Best for structured output
- llama3.2-vision:latest - Good alternative
- llava:latest - Fallback option
Performance
- Startup: < 1 second
- Screenshot: ~100ms
- Vision Analysis: 2-5 seconds (depends on model)
- Total Response Time: 3-7 seconds per approval
Limitations
- X11 Only: Requires X11 display server (no Wayland support for scrot)
- Linux Only: Currently only tested on Debian/Ubuntu
- Vision Dependency: Requires Ollama and vision model
- Screen Required: Must have GUI session (no headless support)
Future Enhancements
- Wayland support (alternative screenshot tools)
- macOS support
- Headless mode (API-only, no screenshots)
- Configurable response patterns
- Multi-terminal support
- Session recording and replay
License
MIT License - See LICENSE file
Author
Svrnty
- Email: jp@svrnty.io
- Repository: https://git.openharbor.io/svrnty/claude-vision-auto
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
Acknowledgments
- Anthropic - Claude Code CLI
- MiniCPM-V - Vision model
- Ollama - Model serving infrastructure