Vision-module-auto/README.md

# Claude Vision Auto

Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model.

## Overview

Claude Vision Auto automatically detects and responds to approval prompts in Claude Code by:
1. Monitoring terminal output for approval keywords
2. Taking screenshots when idle (waiting for input)
3. Analyzing screenshots with MiniCPM-V vision model via Ollama
4. Automatically submitting appropriate responses

## Features

- **Zero Pattern Matching**: Uses vision AI instead of fragile regex patterns
- **Universal Compatibility**: Works with any Claude Code prompt format
- **Intelligent Detection**: Only activates when approval keywords are present
- **Configurable**: Environment variables for all settings
- **Lightweight**: Minimal dependencies (only `requests`)
- **Debug Mode**: Verbose logging for troubleshooting

## Prerequisites

### Required

1. **Claude Code CLI** - Anthropic's official CLI tool
   ```bash
   npm install -g @anthropic-ai/claude-code
   ```

2. **Ollama with MiniCPM-V** - Vision model server
   ```bash
   docker pull ollama/ollama
   docker run -d -p 11434:11434 --name ollama ollama/ollama
   docker exec ollama ollama pull minicpm-v:latest
   ```

3. **Screenshot Tool** (one of):
   - `scrot` (recommended)
   - `gnome-screenshot`
   - `imagemagick` (import command)
   - `maim`

### Install Screenshot Tool

```bash
# Debian/Ubuntu (recommended)
sudo apt-get install scrot

# Alternative options
sudo apt-get install gnome-screenshot
sudo apt-get install imagemagick
sudo apt-get install maim
```

## Installation

### Quick Install

```bash
cd claude-vision-auto
make deps      # Install system dependencies
make install   # Install the package
```

### Manual Install

```bash
# Install system dependencies
sudo apt-get update
sudo apt-get install -y scrot python3-pip

# Install Python package
pip3 install -e .
```

### Verify Installation

```bash
# Check if command is available
which claude-vision

# Test Ollama connection
curl http://localhost:11434/api/tags
```

## Usage

### Basic Usage

Replace `claude` with `claude-vision`:

```bash
# Instead of:
claude

# Use:
claude-vision
```

### With Prompts

```bash
# Pass prompts directly
claude-vision "create a test file in /tmp"

# Interactive session
claude-vision
```

### Configuration

Set environment variables to customize behavior:

```bash
# Ollama URL (default: http://localhost:11434/api/generate)
export OLLAMA_URL="http://custom-host:11434/api/generate"

# Vision model (default: minicpm-v:latest)
export VISION_MODEL="llama3.2-vision:latest"

# Idle threshold in seconds (default: 3.0)
export IDLE_THRESHOLD="5.0"

# Response delay in seconds (default: 1.0)
export RESPONSE_DELAY="2.0"

# Enable debug mode
export DEBUG="true"

# Run with custom settings
claude-vision
```

## How It Works

1. **Launch**: Spawns `claude` as subprocess
2. **Monitor**: Watches output for approval keywords (Yes, No, Approve, etc.)
3. **Detect Idle**: When output stops for `IDLE_THRESHOLD` seconds
4. **Screenshot**: Captures terminal window with `scrot`
5. **Analyze**: Sends to MiniCPM-V via Ollama API
6. **Respond**: Vision model returns "1", "y", or "WAIT"
7. **Submit**: Automatically sends response if not "WAIT"

## Configuration Options

| Variable | Default | Description |
|----------|---------|-------------|
| `OLLAMA_URL` | `http://localhost:11434/api/generate` | Ollama API endpoint |
| `VISION_MODEL` | `minicpm-v:latest` | Vision model to use |
| `IDLE_THRESHOLD` | `3.0` | Seconds of idle before screenshot |
| `RESPONSE_DELAY` | `1.0` | Seconds to wait before responding |
| `OUTPUT_BUFFER_SIZE` | `4096` | Bytes of output to buffer |
| `SCREENSHOT_TIMEOUT` | `5` | Screenshot capture timeout |
| `VISION_TIMEOUT` | `30` | Vision analysis timeout |
| `DEBUG` | `false` | Enable verbose logging |

## Troubleshooting

### Ollama Not Connected

```bash
# Check if Ollama is running
docker ps | grep ollama

# Check if model is available
curl http://localhost:11434/api/tags
```

### Screenshot Fails

```bash
# Test screenshot tool
scrot /tmp/test.png
ls -lh /tmp/test.png

# Install if missing
sudo apt-get install scrot
```

### Debug Mode

```bash
# Run with verbose logging
DEBUG=true claude-vision "test command"
```

### Vision Model Not Found

```bash
# Pull the model
docker exec ollama ollama pull minicpm-v:latest

# Or use alternative vision model
export VISION_MODEL="llava:latest"
claude-vision
```

## Development

### Setup Development Environment

```bash
# Clone repository
git clone https://git.openharbor.io/svrnty/claude-vision-auto.git
cd claude-vision-auto

# Install in development mode
pip3 install -e .

# Run tests
make test
```

### Project Structure

```
claude-vision-auto/
├── README.md                    # This file
├── LICENSE                      # MIT License
├── setup.py                     # Package setup
├── requirements.txt             # Python dependencies
├── Makefile                     # Build automation
├── .gitignore                   # Git ignore rules
├── claude_vision_auto/          # Main package
│   ├── __init__.py             # Package initialization
│   ├── main.py                 # CLI entry point
│   ├── config.py               # Configuration
│   ├── screenshot.py           # Screenshot capture
│   └── vision_analyzer.py      # Vision analysis
├── bin/
│   └── claude-vision           # CLI wrapper script
├── tests/                       # Test suite
│   └── test_vision.py
├── docs/                        # Documentation
│   ├── INSTALLATION.md
│   └── USAGE.md
└── examples/                    # Usage examples
    └── example_usage.sh
```

## Supported Vision Models

Tested and working:

- **minicpm-v:latest** (Recommended) - Best for structured output
- **llama3.2-vision:latest** - Good alternative
- **llava:latest** - Fallback option

## Performance

- **Startup**: < 1 second
- **Screenshot**: ~100ms
- **Vision Analysis**: 2-5 seconds (depends on model)
- **Total Response Time**: 3-7 seconds per approval

## Limitations

- **X11 Only**: Requires X11 display server (no Wayland support for scrot)
- **Linux Only**: Currently only tested on Debian/Ubuntu
- **Vision Dependency**: Requires Ollama and vision model
- **Screen Required**: Must have GUI session (no headless support)

## Future Enhancements

- [ ] Wayland support (alternative screenshot tools)
- [ ] macOS support
- [ ] Headless mode (API-only, no screenshots)
- [ ] Configurable response patterns
- [ ] Multi-terminal support
- [ ] Session recording and replay

## License

MIT License - See LICENSE file

## Author

**Svrnty**
- Email: jp@svrnty.io
- Repository: https://git.openharbor.io/svrnty/claude-vision-auto

## Contributing

Contributions welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Submit a pull request

## Acknowledgments

- **Anthropic** - Claude Code CLI
- **MiniCPM-V** - Vision model
- **Ollama** - Model serving infrastructure