Go to file

Svrnty 5ea9928924 fix: Screenshot capture with DISPLAY env and full screen - Use full screen capture instead of active window (more reliable) - Explicitly preserve and set DISPLAY environment variable - Default to :0 if DISPLAY not set - Better error logging with stderr output - Use time.time() instead of date command for timestamp Fixes screenshot failures when DISPLAY not inherited from parent process. Changes from window-specific capture to full screen for better reliability. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>		2025-10-30 02:17:12 -04:00
bin	feat: Add YAML-based configuration system	2025-10-29 10:19:35 -04:00
claude_vision_auto	fix: Screenshot capture with DISPLAY env and full screen	2025-10-30 02:17:12 -04:00
docs	Initial release of Claude Vision Auto v1.0.0	2025-10-29 10:09:01 -04:00
examples	Initial release of Claude Vision Auto v1.0.0	2025-10-29 10:09:01 -04:00
tests	Initial release of Claude Vision Auto v1.0.0	2025-10-29 10:09:01 -04:00
.gitignore	Initial release of Claude Vision Auto v1.0.0	2025-10-29 10:09:01 -04:00
CHANGELOG.md	Initial release of Claude Vision Auto v1.0.0	2025-10-29 10:09:01 -04:00
LICENSE	Initial release of Claude Vision Auto v1.0.0	2025-10-29 10:09:01 -04:00
Makefile	Initial release of Claude Vision Auto v1.0.0	2025-10-29 10:09:01 -04:00
MANIFEST.in	feat: Add YAML-based configuration system	2025-10-29 10:19:35 -04:00
QUICKSTART.md	feat: Add YAML-based configuration system	2025-10-29 10:19:35 -04:00
README.md	Initial release of Claude Vision Auto v1.0.0	2025-10-29 10:09:01 -04:00
requirements-dev.txt	Initial release of Claude Vision Auto v1.0.0	2025-10-29 10:09:01 -04:00
requirements.txt	feat: Add YAML-based configuration system	2025-10-29 10:19:35 -04:00
setup.py	feat: Add YAML-based configuration system	2025-10-29 10:19:35 -04:00

README.md

Claude Vision Auto

Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model.

Overview

Claude Vision Auto automatically detects and responds to approval prompts in Claude Code by:

Monitoring terminal output for approval keywords
Taking screenshots when idle (waiting for input)
Analyzing screenshots with MiniCPM-V vision model via Ollama
Automatically submitting appropriate responses

Features

Zero Pattern Matching: Uses vision AI instead of fragile regex patterns
Universal Compatibility: Works with any Claude Code prompt format
Intelligent Detection: Only activates when approval keywords are present
Configurable: Environment variables for all settings
Lightweight: Minimal dependencies (only requests)
Debug Mode: Verbose logging for troubleshooting

Prerequisites

Required

Claude Code CLI - Anthropic's official CLI tool
```
npm install -g @anthropic-ai/claude-code
```

Ollama with MiniCPM-V - Vision model server

docker pull ollama/ollama
docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull minicpm-v:latest

Screenshot Tool (one of):
- scrot (recommended)
- gnome-screenshot
- imagemagick (import command)
- maim

Install Screenshot Tool

# Debian/Ubuntu (recommended)
sudo apt-get install scrot

# Alternative options
sudo apt-get install gnome-screenshot
sudo apt-get install imagemagick
sudo apt-get install maim

Installation

Quick Install

cd claude-vision-auto
make deps      # Install system dependencies
make install   # Install the package

Manual Install

# Install system dependencies
sudo apt-get update
sudo apt-get install -y scrot python3-pip

# Install Python package
pip3 install -e .

Verify Installation

# Check if command is available
which claude-vision

# Test Ollama connection
curl http://localhost:11434/api/tags

Usage

Basic Usage

Replace claude with claude-vision:

# Instead of:
claude

# Use:
claude-vision

With Prompts

# Pass prompts directly
claude-vision "create a test file in /tmp"

# Interactive session
claude-vision

Configuration

Set environment variables to customize behavior:

# Ollama URL (default: http://localhost:11434/api/generate)
export OLLAMA_URL="http://custom-host:11434/api/generate"

# Vision model (default: minicpm-v:latest)
export VISION_MODEL="llama3.2-vision:latest"

# Idle threshold in seconds (default: 3.0)
export IDLE_THRESHOLD="5.0"

# Response delay in seconds (default: 1.0)
export RESPONSE_DELAY="2.0"

# Enable debug mode
export DEBUG="true"

# Run with custom settings
claude-vision

How It Works

Launch: Spawns claude as subprocess
Monitor: Watches output for approval keywords (Yes, No, Approve, etc.)
Detect Idle: When output stops for IDLE_THRESHOLD seconds
Screenshot: Captures terminal window with scrot
Analyze: Sends to MiniCPM-V via Ollama API
Respond: Vision model returns "1", "y", or "WAIT"
Submit: Automatically sends response if not "WAIT"

Configuration Options

Variable	Default	Description
`OLLAMA_URL`	`http://localhost:11434/api/generate`	Ollama API endpoint
`VISION_MODEL`	`minicpm-v:latest`	Vision model to use
`IDLE_THRESHOLD`	`3.0`	Seconds of idle before screenshot
`RESPONSE_DELAY`	`1.0`	Seconds to wait before responding
`OUTPUT_BUFFER_SIZE`	`4096`	Bytes of output to buffer
`SCREENSHOT_TIMEOUT`	`5`	Screenshot capture timeout
`VISION_TIMEOUT`	`30`	Vision analysis timeout
`DEBUG`	`false`	Enable verbose logging

Troubleshooting

Ollama Not Connected

# Check if Ollama is running
docker ps | grep ollama

# Check if model is available
curl http://localhost:11434/api/tags

Screenshot Fails

# Test screenshot tool
scrot /tmp/test.png
ls -lh /tmp/test.png

# Install if missing
sudo apt-get install scrot

Debug Mode

# Run with verbose logging
DEBUG=true claude-vision "test command"

Vision Model Not Found

# Pull the model
docker exec ollama ollama pull minicpm-v:latest

# Or use alternative vision model
export VISION_MODEL="llava:latest"
claude-vision

Development

Setup Development Environment

# Clone repository
git clone https://git.openharbor.io/svrnty/claude-vision-auto.git
cd claude-vision-auto

# Install in development mode
pip3 install -e .

# Run tests
make test

Project Structure

claude-vision-auto/
├── README.md                    # This file
├── LICENSE                      # MIT License
├── setup.py                     # Package setup
├── requirements.txt             # Python dependencies
├── Makefile                     # Build automation
├── .gitignore                   # Git ignore rules
├── claude_vision_auto/          # Main package
│   ├── __init__.py             # Package initialization
│   ├── main.py                 # CLI entry point
│   ├── config.py               # Configuration
│   ├── screenshot.py           # Screenshot capture
│   └── vision_analyzer.py      # Vision analysis
├── bin/
│   └── claude-vision           # CLI wrapper script
├── tests/                       # Test suite
│   └── test_vision.py
├── docs/                        # Documentation
│   ├── INSTALLATION.md
│   └── USAGE.md
└── examples/                    # Usage examples
    └── example_usage.sh

Supported Vision Models

Tested and working:

minicpm-v:latest (Recommended) - Best for structured output
llama3.2-vision:latest - Good alternative
llava:latest - Fallback option

Performance

Startup: < 1 second
Screenshot: ~100ms
Vision Analysis: 2-5 seconds (depends on model)
Total Response Time: 3-7 seconds per approval

Limitations

X11 Only: Requires X11 display server (no Wayland support for scrot)
Linux Only: Currently only tested on Debian/Ubuntu
Vision Dependency: Requires Ollama and vision model
Screen Required: Must have GUI session (no headless support)

Future Enhancements

Wayland support (alternative screenshot tools)
macOS support
Headless mode (API-only, no screenshots)
Configurable response patterns
Multi-terminal support
Session recording and replay

License

MIT License - See LICENSE file

Author

Svrnty

Email: jp@svrnty.io
Repository: https://git.openharbor.io/svrnty/claude-vision-auto

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Submit a pull request

Acknowledgments

Anthropic - Claude Code CLI
MiniCPM-V - Vision model
Ollama - Model serving infrastructure