Go to file
Svrnty 5ea9928924 fix: Screenshot capture with DISPLAY env and full screen
- Use full screen capture instead of active window (more reliable)
- Explicitly preserve and set DISPLAY environment variable
- Default to :0 if DISPLAY not set
- Better error logging with stderr output
- Use time.time() instead of date command for timestamp

Fixes screenshot failures when DISPLAY not inherited from parent process.
Changes from window-specific capture to full screen for better reliability.

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>
2025-10-30 02:17:12 -04:00
bin feat: Add YAML-based configuration system 2025-10-29 10:19:35 -04:00
claude_vision_auto fix: Screenshot capture with DISPLAY env and full screen 2025-10-30 02:17:12 -04:00
docs Initial release of Claude Vision Auto v1.0.0 2025-10-29 10:09:01 -04:00
examples Initial release of Claude Vision Auto v1.0.0 2025-10-29 10:09:01 -04:00
tests Initial release of Claude Vision Auto v1.0.0 2025-10-29 10:09:01 -04:00
.gitignore Initial release of Claude Vision Auto v1.0.0 2025-10-29 10:09:01 -04:00
CHANGELOG.md Initial release of Claude Vision Auto v1.0.0 2025-10-29 10:09:01 -04:00
LICENSE Initial release of Claude Vision Auto v1.0.0 2025-10-29 10:09:01 -04:00
Makefile Initial release of Claude Vision Auto v1.0.0 2025-10-29 10:09:01 -04:00
MANIFEST.in feat: Add YAML-based configuration system 2025-10-29 10:19:35 -04:00
QUICKSTART.md feat: Add YAML-based configuration system 2025-10-29 10:19:35 -04:00
README.md Initial release of Claude Vision Auto v1.0.0 2025-10-29 10:09:01 -04:00
requirements-dev.txt Initial release of Claude Vision Auto v1.0.0 2025-10-29 10:09:01 -04:00
requirements.txt feat: Add YAML-based configuration system 2025-10-29 10:19:35 -04:00
setup.py feat: Add YAML-based configuration system 2025-10-29 10:19:35 -04:00

Claude Vision Auto

Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model.

Overview

Claude Vision Auto automatically detects and responds to approval prompts in Claude Code by:

  1. Monitoring terminal output for approval keywords
  2. Taking screenshots when idle (waiting for input)
  3. Analyzing screenshots with MiniCPM-V vision model via Ollama
  4. Automatically submitting appropriate responses

Features

  • Zero Pattern Matching: Uses vision AI instead of fragile regex patterns
  • Universal Compatibility: Works with any Claude Code prompt format
  • Intelligent Detection: Only activates when approval keywords are present
  • Configurable: Environment variables for all settings
  • Lightweight: Minimal dependencies (only requests)
  • Debug Mode: Verbose logging for troubleshooting

Prerequisites

Required

  1. Claude Code CLI - Anthropic's official CLI tool

    npm install -g @anthropic-ai/claude-code
    
  2. Ollama with MiniCPM-V - Vision model server

    docker pull ollama/ollama
    docker run -d -p 11434:11434 --name ollama ollama/ollama
    docker exec ollama ollama pull minicpm-v:latest
    
  3. Screenshot Tool (one of):

    • scrot (recommended)
    • gnome-screenshot
    • imagemagick (import command)
    • maim

Install Screenshot Tool

# Debian/Ubuntu (recommended)
sudo apt-get install scrot

# Alternative options
sudo apt-get install gnome-screenshot
sudo apt-get install imagemagick
sudo apt-get install maim

Installation

Quick Install

cd claude-vision-auto
make deps      # Install system dependencies
make install   # Install the package

Manual Install

# Install system dependencies
sudo apt-get update
sudo apt-get install -y scrot python3-pip

# Install Python package
pip3 install -e .

Verify Installation

# Check if command is available
which claude-vision

# Test Ollama connection
curl http://localhost:11434/api/tags

Usage

Basic Usage

Replace claude with claude-vision:

# Instead of:
claude

# Use:
claude-vision

With Prompts

# Pass prompts directly
claude-vision "create a test file in /tmp"

# Interactive session
claude-vision

Configuration

Set environment variables to customize behavior:

# Ollama URL (default: http://localhost:11434/api/generate)
export OLLAMA_URL="http://custom-host:11434/api/generate"

# Vision model (default: minicpm-v:latest)
export VISION_MODEL="llama3.2-vision:latest"

# Idle threshold in seconds (default: 3.0)
export IDLE_THRESHOLD="5.0"

# Response delay in seconds (default: 1.0)
export RESPONSE_DELAY="2.0"

# Enable debug mode
export DEBUG="true"

# Run with custom settings
claude-vision

How It Works

  1. Launch: Spawns claude as subprocess
  2. Monitor: Watches output for approval keywords (Yes, No, Approve, etc.)
  3. Detect Idle: When output stops for IDLE_THRESHOLD seconds
  4. Screenshot: Captures terminal window with scrot
  5. Analyze: Sends to MiniCPM-V via Ollama API
  6. Respond: Vision model returns "1", "y", or "WAIT"
  7. Submit: Automatically sends response if not "WAIT"

Configuration Options

Variable Default Description
OLLAMA_URL http://localhost:11434/api/generate Ollama API endpoint
VISION_MODEL minicpm-v:latest Vision model to use
IDLE_THRESHOLD 3.0 Seconds of idle before screenshot
RESPONSE_DELAY 1.0 Seconds to wait before responding
OUTPUT_BUFFER_SIZE 4096 Bytes of output to buffer
SCREENSHOT_TIMEOUT 5 Screenshot capture timeout
VISION_TIMEOUT 30 Vision analysis timeout
DEBUG false Enable verbose logging

Troubleshooting

Ollama Not Connected

# Check if Ollama is running
docker ps | grep ollama

# Check if model is available
curl http://localhost:11434/api/tags

Screenshot Fails

# Test screenshot tool
scrot /tmp/test.png
ls -lh /tmp/test.png

# Install if missing
sudo apt-get install scrot

Debug Mode

# Run with verbose logging
DEBUG=true claude-vision "test command"

Vision Model Not Found

# Pull the model
docker exec ollama ollama pull minicpm-v:latest

# Or use alternative vision model
export VISION_MODEL="llava:latest"
claude-vision

Development

Setup Development Environment

# Clone repository
git clone https://git.openharbor.io/svrnty/claude-vision-auto.git
cd claude-vision-auto

# Install in development mode
pip3 install -e .

# Run tests
make test

Project Structure

claude-vision-auto/
├── README.md                    # This file
├── LICENSE                      # MIT License
├── setup.py                     # Package setup
├── requirements.txt             # Python dependencies
├── Makefile                     # Build automation
├── .gitignore                   # Git ignore rules
├── claude_vision_auto/          # Main package
│   ├── __init__.py             # Package initialization
│   ├── main.py                 # CLI entry point
│   ├── config.py               # Configuration
│   ├── screenshot.py           # Screenshot capture
│   └── vision_analyzer.py      # Vision analysis
├── bin/
│   └── claude-vision           # CLI wrapper script
├── tests/                       # Test suite
│   └── test_vision.py
├── docs/                        # Documentation
│   ├── INSTALLATION.md
│   └── USAGE.md
└── examples/                    # Usage examples
    └── example_usage.sh

Supported Vision Models

Tested and working:

  • minicpm-v:latest (Recommended) - Best for structured output
  • llama3.2-vision:latest - Good alternative
  • llava:latest - Fallback option

Performance

  • Startup: < 1 second
  • Screenshot: ~100ms
  • Vision Analysis: 2-5 seconds (depends on model)
  • Total Response Time: 3-7 seconds per approval

Limitations

  • X11 Only: Requires X11 display server (no Wayland support for scrot)
  • Linux Only: Currently only tested on Debian/Ubuntu
  • Vision Dependency: Requires Ollama and vision model
  • Screen Required: Must have GUI session (no headless support)

Future Enhancements

  • Wayland support (alternative screenshot tools)
  • macOS support
  • Headless mode (API-only, no screenshots)
  • Configurable response patterns
  • Multi-terminal support
  • Session recording and replay

License

MIT License - See LICENSE file

Author

Svrnty

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

Acknowledgments

  • Anthropic - Claude Code CLI
  • MiniCPM-V - Vision model
  • Ollama - Model serving infrastructure