Initial release of Claude Vision Auto v1.0.0

Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model.

Features:
- Automatic detection and response to approval prompts
- Screenshot capture and vision analysis via Ollama
- Support for multiple screenshot tools (scrot, gnome-screenshot, etc.)
- Configurable timing and behavior
- Debug mode for troubleshooting
- Comprehensive documentation

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>
This commit is contained in:
Svrnty
2025-10-29 10:09:01 -04:00
commit 41cecca0e2
18 changed files with 1916 additions and 0 deletions
+358
View File
@@ -0,0 +1,358 @@
# Installation Guide
Detailed installation instructions for Claude Vision Auto.
## Table of Contents
1. [Prerequisites](#prerequisites)
2. [System Dependencies](#system-dependencies)
3. [Ollama Setup](#ollama-setup)
4. [Package Installation](#package-installation)
5. [Verification](#verification)
6. [Troubleshooting](#troubleshooting)
## Prerequisites
### 1. Claude Code CLI
Install Anthropic's official CLI:
```bash
npm install -g @anthropic-ai/claude-code
```
Verify installation:
```bash
claude --version
```
### 2. Python 3.8+
Check Python version:
```bash
python3 --version
```
If not installed:
```bash
sudo apt-get update
sudo apt-get install python3 python3-pip
```
### 3. Docker (for Ollama)
Install Docker:
```bash
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
```
Log out and back in for group changes to take effect.
## System Dependencies
### Screenshot Tool
Install `scrot` (recommended):
```bash
sudo apt-get update
sudo apt-get install -y scrot
```
Alternative screenshot tools:
```bash
# GNOME Screenshot
sudo apt-get install -y gnome-screenshot
# ImageMagick
sudo apt-get install -y imagemagick
# Maim
sudo apt-get install -y maim xdotool
```
### Additional Dependencies
```bash
sudo apt-get install -y \
python3-pip \
git \
curl
```
## Ollama Setup
### 1. Pull Ollama Docker Image
```bash
docker pull ollama/ollama:latest
```
### 2. Start Ollama Container
```bash
docker run -d \
-p 11434:11434 \
--name ollama \
--restart unless-stopped \
ollama/ollama:latest
```
For GPU support (NVIDIA):
```bash
docker run -d \
-p 11434:11434 \
--name ollama \
--gpus all \
--restart unless-stopped \
ollama/ollama:latest
```
### 3. Pull Vision Model
```bash
# MiniCPM-V (recommended - 5.5GB)
docker exec ollama ollama pull minicpm-v:latest
# Alternative: Llama 3.2 Vision (7.8GB)
docker exec ollama ollama pull llama3.2-vision:latest
# Alternative: LLaVA (4.5GB)
docker exec ollama ollama pull llava:latest
```
### 4. Verify Ollama
```bash
# Check container status
docker ps | grep ollama
# Test API
curl http://localhost:11434/api/tags
# List installed models
curl -s http://localhost:11434/api/tags | python3 -m json.tool
```
## Package Installation
### Method 1: Using Makefile (Recommended)
```bash
cd claude-vision-auto
# Install system dependencies
make deps
# Install package
make install
```
### Method 2: Manual Installation
```bash
cd claude-vision-auto
# Install system dependencies
sudo apt-get update
sudo apt-get install -y scrot python3-pip
# Install Python package
pip3 install -e .
```
### Method 3: From Git
```bash
# Clone repository
git clone https://git.openharbor.io/svrnty/claude-vision-auto.git
cd claude-vision-auto
# Install
pip3 install -e .
```
## Verification
### 1. Check Command Installation
```bash
which claude-vision
```
Expected output: `/home/username/.local/bin/claude-vision`
### 2. Test Ollama Connection
```bash
curl http://localhost:11434/api/tags
```
Should return JSON with list of models.
### 3. Test Screenshot
```bash
scrot /tmp/test_screenshot.png
ls -lh /tmp/test_screenshot.png
```
Should create a screenshot file.
### 4. Run Test
```bash
# Start claude-vision
claude-vision
# You should see:
# [Claude Vision Auto] Testing Ollama connection...
# [Claude Vision Auto] Connected to Ollama
# [Claude Vision Auto] Using model: minicpm-v:latest
```
## Troubleshooting
### "claude-vision: command not found"
Add to PATH in `~/.bashrc` or `~/.zshrc`:
```bash
export PATH="$HOME/.local/bin:$PATH"
```
Then reload:
```bash
source ~/.bashrc # or source ~/.zshrc
```
### "Cannot connect to Ollama"
Check if Ollama container is running:
```bash
docker ps | grep ollama
# If not running, start it:
docker start ollama
```
Check if port 11434 is open:
```bash
netstat -tulpn | grep 11434
# or
ss -tulpn | grep 11434
```
### "Model not found"
Pull the model:
```bash
docker exec ollama ollama pull minicpm-v:latest
```
List available models:
```bash
docker exec ollama ollama list
```
### "Screenshot failed"
Install scrot:
```bash
sudo apt-get install scrot
```
Test screenshot:
```bash
scrot -u /tmp/test.png
```
If error persists, try alternative tools in config:
```bash
export SCREENSHOT_TOOL="gnome-screenshot"
claude-vision
```
### Permission Issues
If pip install fails with permissions:
```bash
# Install for user only
pip3 install --user -e .
# Or use virtual environment
python3 -m venv venv
source venv/bin/activate
pip install -e .
```
### Docker Permission Denied
Add user to docker group:
```bash
sudo usermod -aG docker $USER
```
Log out and back in, then:
```bash
docker ps # Should work without sudo
```
## Uninstallation
### Remove Package
```bash
make uninstall
# or
pip3 uninstall claude-vision-auto
```
### Remove Ollama
```bash
docker stop ollama
docker rm ollama
docker rmi ollama/ollama
```
### Remove System Dependencies
```bash
sudo apt-get remove scrot
```
## Next Steps
After successful installation:
1. Read [USAGE.md](USAGE.md) for usage examples
2. Configure environment variables if needed
3. Test with a simple Claude Code command
## Getting Help
If you encounter issues not covered here:
1. Check the main [README.md](../README.md)
2. Enable debug mode: `DEBUG=true claude-vision`
3. Check logs: `~/.cache/claude-vision-auto/`
4. Report issues: https://git.openharbor.io/svrnty/claude-vision-auto/issues
+439
View File
@@ -0,0 +1,439 @@
# Usage Guide
Comprehensive usage guide for Claude Vision Auto.
## Table of Contents
1. [Basic Usage](#basic-usage)
2. [Configuration](#configuration)
3. [Common Scenarios](#common-scenarios)
4. [Advanced Usage](#advanced-usage)
5. [Tips and Best Practices](#tips-and-best-practices)
## Basic Usage
### Starting Interactive Session
Simply replace `claude` with `claude-vision`:
```bash
claude-vision
```
Expected output:
```
[Claude Vision Auto] Testing Ollama connection...
[Claude Vision Auto] Connected to Ollama
[Claude Vision Auto] Using model: minicpm-v:latest
[Claude Vision Auto] Idle threshold: 3.0s
▐▛███▜▌ Claude Code v2.0.26
▝▜█████▛▘ Sonnet 4.5 · Claude Max
▘▘ ▝▝ /home/username/project
>
```
### With Initial Prompt
```bash
claude-vision "create a test.md file in /tmp"
```
### How Auto-Approval Works
When Claude asks for approval:
```
╭───────────────────────────────────────────╮
│ Create file │
│ ╭───────────────────────────────────────╮ │
│ │ /tmp/test.md │ │
│ │ │ │
│ │ # Test File │ │
│ │ │ │
│ │ This is a test. │ │
│ ╰───────────────────────────────────────╯ │
│ Do you want to create test.md? │
1. Yes │
│ 2. Yes, allow all edits │
│ 3. No │
╰───────────────────────────────────────────╯
```
Claude Vision Auto will:
1. Detect idle state (3 seconds)
2. Take screenshot
3. Analyze with vision model
4. Automatically select option 1
Output:
```
[Vision] Analyzing prompt...
[Vision] Response: 1
[Vision] Response sent
```
## Configuration
### Environment Variables
Set before running `claude-vision`:
```bash
# Example: Custom configuration
export OLLAMA_URL="http://192.168.1.100:11434/api/generate"
export VISION_MODEL="llama3.2-vision:latest"
export IDLE_THRESHOLD="5.0"
export RESPONSE_DELAY="2.0"
export DEBUG="true"
claude-vision
```
### Persistent Configuration
Add to `~/.bashrc` or `~/.zshrc`:
```bash
# Claude Vision Auto Configuration
export CLAUDE_VISION_OLLAMA_URL="http://localhost:11434/api/generate"
export CLAUDE_VISION_MODEL="minicpm-v:latest"
export CLAUDE_VISION_IDLE_THRESHOLD="3.0"
export CLAUDE_VISION_RESPONSE_DELAY="1.0"
# Alias for convenience
alias cv="claude-vision"
```
Reload:
```bash
source ~/.bashrc
```
### Configuration Options Reference
| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| `OLLAMA_URL` | URL | `http://localhost:11434/api/generate` | Ollama API endpoint |
| `VISION_MODEL` | String | `minicpm-v:latest` | Vision model name |
| `IDLE_THRESHOLD` | Float | `3.0` | Seconds to wait before screenshot |
| `RESPONSE_DELAY` | Float | `1.0` | Seconds to wait before responding |
| `OUTPUT_BUFFER_SIZE` | Integer | `4096` | Buffer size in bytes |
| `SCREENSHOT_TIMEOUT` | Integer | `5` | Screenshot timeout (seconds) |
| `VISION_TIMEOUT` | Integer | `30` | Vision analysis timeout (seconds) |
| `DEBUG` | Boolean | `false` | Enable debug logging |
## Common Scenarios
### Scenario 1: File Creation
```bash
claude-vision "create a new Python script called hello.py"
```
Auto-approves file creation.
### Scenario 2: File Editing
```bash
claude-vision "add error handling to main.py"
```
Auto-approves file edits.
### Scenario 3: Multiple Operations
```bash
claude-vision "refactor the authentication module and add tests"
```
Auto-approves each operation sequentially.
### Scenario 4: Longer Wait Time
For slower systems or models:
```bash
IDLE_THRESHOLD="5.0" RESPONSE_DELAY="2.0" claude-vision
```
### Scenario 5: Different Vision Model
```bash
VISION_MODEL="llama3.2-vision:latest" claude-vision
```
### Scenario 6: Remote Ollama
```bash
OLLAMA_URL="http://192.168.1.100:11434/api/generate" claude-vision
```
### Scenario 7: Debug Mode
When troubleshooting:
```bash
DEBUG=true claude-vision "test command"
```
Output will include:
```
[DEBUG] Screenshot saved to /home/user/.cache/claude-vision-auto/screenshot_1234567890.png
[DEBUG] Sending to Ollama: http://localhost:11434/api/generate
[DEBUG] Model: minicpm-v:latest
[DEBUG] Vision model response: 1
```
## Advanced Usage
### Using Different Models
#### MiniCPM-V (Default)
Best for structured responses:
```bash
VISION_MODEL="minicpm-v:latest" claude-vision
```
#### Llama 3.2 Vision
Good alternative:
```bash
VISION_MODEL="llama3.2-vision:latest" claude-vision
```
#### LLaVA
Lightweight option:
```bash
VISION_MODEL="llava:latest" claude-vision
```
### Shell Aliases
Add to `~/.bashrc`:
```bash
# Quick aliases
alias cv="claude-vision"
alias cvd="DEBUG=true claude-vision" # Debug mode
alias cvs="IDLE_THRESHOLD=5.0 claude-vision" # Slower
# Model-specific
alias cv-mini="VISION_MODEL=minicpm-v:latest claude-vision"
alias cv-llama="VISION_MODEL=llama3.2-vision:latest claude-vision"
```
### Integration with Scripts
```bash
#!/bin/bash
# auto-refactor.sh
export DEBUG="false"
export IDLE_THRESHOLD="3.0"
claude-vision "refactor all JavaScript files to use modern ES6 syntax"
```
### Conditional Auto-Approval
Create wrapper script:
```bash
#!/bin/bash
# conditional-claude.sh
if [ "$AUTO_APPROVE" = "true" ]; then
claude-vision "$@"
else
claude "$@"
fi
```
Usage:
```bash
AUTO_APPROVE=true ./conditional-claude.sh "create files"
```
### Multiple Terminal Support
Each terminal needs separate instance:
```bash
# Terminal 1
claude-vision # Project A
# Terminal 2
claude-vision # Project B
```
## Tips and Best Practices
### 1. Adjust Idle Threshold
- **Fast system**: `IDLE_THRESHOLD=2.0`
- **Slow system**: `IDLE_THRESHOLD=5.0`
- **Remote Ollama**: `IDLE_THRESHOLD=4.0`
### 2. Model Selection
- **Accuracy**: MiniCPM-V > Llama 3.2 Vision > LLaVA
- **Speed**: LLaVA > MiniCPM-V > Llama 3.2 Vision
- **Size**: LLaVA (4.5GB) < MiniCPM-V (5.5GB) < Llama 3.2 (7.8GB)
### 3. Debug When Needed
Enable debug mode if responses are incorrect:
```bash
DEBUG=true claude-vision
```
Check vision model output to see what it's detecting.
### 4. Screenshot Quality
Ensure terminal is visible and not obscured by other windows.
### 5. Performance Optimization
For faster responses:
```bash
# Pre-warm the model
docker exec ollama ollama run minicpm-v:latest "test" < /dev/null
# Then use normally
claude-vision
```
### 6. Clean Up Old Screenshots
Screenshots are auto-cleaned after 1 hour, but manual cleanup:
```bash
rm -rf ~/.cache/claude-vision-auto/*.png
```
### 7. Check Model Status
Before starting long sessions:
```bash
# Verify Ollama is responsive
curl -s http://localhost:11434/api/tags | python3 -m json.tool
# Check model is loaded
docker exec ollama ollama ps
```
## Troubleshooting During Use
### Issue: Not Auto-Responding
**Symptoms**: Vision analyzes but doesn't respond
**Solutions**:
1. Check if "WAIT" is returned:
```bash
DEBUG=true claude-vision
# Look for: [DEBUG] Vision model response: WAIT
```
2. Increase idle threshold:
```bash
IDLE_THRESHOLD="5.0" claude-vision
```
3. Try different model:
```bash
VISION_MODEL="llama3.2-vision:latest" claude-vision
```
### Issue: Wrong Response
**Symptoms**: Selects wrong option or types wrong answer
**Solutions**:
1. Enable debug to see what vision model sees:
```bash
DEBUG=true claude-vision
```
2. Check screenshot manually:
```bash
# Don't auto-clean
ls ~/.cache/claude-vision-auto/
```
3. Adjust response delay:
```bash
RESPONSE_DELAY="2.0" claude-vision
```
### Issue: Slow Response
**Symptoms**: Takes too long to respond
**Solutions**:
1. Use faster model:
```bash
VISION_MODEL="llava:latest" claude-vision
```
2. Reduce vision timeout:
```bash
VISION_TIMEOUT="15" claude-vision
```
3. Check Ollama performance:
```bash
docker stats ollama
```
### Issue: Too Aggressive
**Symptoms**: Responds to non-approval prompts
**Solutions**:
1. Increase idle threshold:
```bash
IDLE_THRESHOLD="5.0" claude-vision
```
2. Check approval keywords in config
3. Use manual mode for sensitive operations:
```bash
claude # No auto-approval
```
## Getting Help
If issues persist:
1. Check logs with `DEBUG=true`
2. Review documentation
3. Report issues with debug output
4. Include screenshot samples
## Next Steps
- Explore [examples/](../examples/) directory
- Customize configuration for your workflow
- Create shell aliases for common tasks
- Integrate with CI/CD pipelines (if applicable)