Initial release of Claude Vision Auto v1.0.0

Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model. Features: - Automatic detection and response to approval prompts - Screenshot capture and vision analysis via Ollama - Support for multiple screenshot tools (scrot, gnome-screenshot, etc.) - Configurable timing and behavior - Debug mode for troubleshooting - Comprehensive documentation Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>
2025-10-29 10:09:01 -04:00
commit 41cecca0e2
18 changed files with 1916 additions and 0 deletions
@@ -0,0 +1,358 @@
+# Installation Guide
+
+Detailed installation instructions for Claude Vision Auto.
+
+## Table of Contents
+
+1. [Prerequisites](#prerequisites)
+2. [System Dependencies](#system-dependencies)
+3. [Ollama Setup](#ollama-setup)
+4. [Package Installation](#package-installation)
+5. [Verification](#verification)
+6. [Troubleshooting](#troubleshooting)
+
+## Prerequisites
+
+### 1. Claude Code CLI
+
+Install Anthropic's official CLI:
+
+```bash
+npm install -g @anthropic-ai/claude-code
+```
+
+Verify installation:
+
+```bash
+claude --version
+```
+
+### 2. Python 3.8+
+
+Check Python version:
+
+```bash
+python3 --version
+```
+
+If not installed:
+
+```bash
+sudo apt-get update
+sudo apt-get install python3 python3-pip
+```
+
+### 3. Docker (for Ollama)
+
+Install Docker:
+
+```bash
+curl -fsSL https://get.docker.com | sh
+sudo usermod -aG docker $USER
+```
+
+Log out and back in for group changes to take effect.
+
+## System Dependencies
+
+### Screenshot Tool
+
+Install `scrot` (recommended):
+
+```bash
+sudo apt-get update
+sudo apt-get install -y scrot
+```
+
+Alternative screenshot tools:
+
+```bash
+# GNOME Screenshot
+sudo apt-get install -y gnome-screenshot
+
+# ImageMagick
+sudo apt-get install -y imagemagick
+
+# Maim
+sudo apt-get install -y maim xdotool
+```
+
+### Additional Dependencies
+
+```bash
+sudo apt-get install -y \
+    python3-pip \
+    git \
+    curl
+```
+
+## Ollama Setup
+
+### 1. Pull Ollama Docker Image
+
+```bash
+docker pull ollama/ollama:latest
+```
+
+### 2. Start Ollama Container
+
+```bash
+docker run -d \
+    -p 11434:11434 \
+    --name ollama \
+    --restart unless-stopped \
+    ollama/ollama:latest
+```
+
+For GPU support (NVIDIA):
+
+```bash
+docker run -d \
+    -p 11434:11434 \
+    --name ollama \
+    --gpus all \
+    --restart unless-stopped \
+    ollama/ollama:latest
+```
+
+### 3. Pull Vision Model
+
+```bash
+# MiniCPM-V (recommended - 5.5GB)
+docker exec ollama ollama pull minicpm-v:latest
+
+# Alternative: Llama 3.2 Vision (7.8GB)
+docker exec ollama ollama pull llama3.2-vision:latest
+
+# Alternative: LLaVA (4.5GB)
+docker exec ollama ollama pull llava:latest
+```
+
+### 4. Verify Ollama
+
+```bash
+# Check container status
+docker ps | grep ollama
+
+# Test API
+curl http://localhost:11434/api/tags
+
+# List installed models
+curl -s http://localhost:11434/api/tags | python3 -m json.tool
+```
+
+## Package Installation
+
+### Method 1: Using Makefile (Recommended)
+
+```bash
+cd claude-vision-auto
+
+# Install system dependencies
+make deps
+
+# Install package
+make install
+```
+
+### Method 2: Manual Installation
+
+```bash
+cd claude-vision-auto
+
+# Install system dependencies
+sudo apt-get update
+sudo apt-get install -y scrot python3-pip
+
+# Install Python package
+pip3 install -e .
+```
+
+### Method 3: From Git
+
+```bash
+# Clone repository
+git clone https://git.openharbor.io/svrnty/claude-vision-auto.git
+cd claude-vision-auto
+
+# Install
+pip3 install -e .
+```
+
+## Verification
+
+### 1. Check Command Installation
+
+```bash
+which claude-vision
+```
+
+Expected output: `/home/username/.local/bin/claude-vision`
+
+### 2. Test Ollama Connection
+
+```bash
+curl http://localhost:11434/api/tags
+```
+
+Should return JSON with list of models.
+
+### 3. Test Screenshot
+
+```bash
+scrot /tmp/test_screenshot.png
+ls -lh /tmp/test_screenshot.png
+```
+
+Should create a screenshot file.
+
+### 4. Run Test
+
+```bash
+# Start claude-vision
+claude-vision
+
+# You should see:
+# [Claude Vision Auto] Testing Ollama connection...
+# [Claude Vision Auto] Connected to Ollama
+# [Claude Vision Auto] Using model: minicpm-v:latest
+```
+
+## Troubleshooting
+
+### "claude-vision: command not found"
+
+Add to PATH in `~/.bashrc` or `~/.zshrc`:
+
+```bash
+export PATH="$HOME/.local/bin:$PATH"
+```
+
+Then reload:
+
+```bash
+source ~/.bashrc  # or source ~/.zshrc
+```
+
+### "Cannot connect to Ollama"
+
+Check if Ollama container is running:
+
+```bash
+docker ps | grep ollama
+
+# If not running, start it:
+docker start ollama
+```
+
+Check if port 11434 is open:
+
+```bash
+netstat -tulpn | grep 11434
+# or
+ss -tulpn | grep 11434
+```
+
+### "Model not found"
+
+Pull the model:
+
+```bash
+docker exec ollama ollama pull minicpm-v:latest
+```
+
+List available models:
+
+```bash
+docker exec ollama ollama list
+```
+
+### "Screenshot failed"
+
+Install scrot:
+
+```bash
+sudo apt-get install scrot
+```
+
+Test screenshot:
+
+```bash
+scrot -u /tmp/test.png
+```
+
+If error persists, try alternative tools in config:
+
+```bash
+export SCREENSHOT_TOOL="gnome-screenshot"
+claude-vision
+```
+
+### Permission Issues
+
+If pip install fails with permissions:
+
+```bash
+# Install for user only
+pip3 install --user -e .
+
+# Or use virtual environment
+python3 -m venv venv
+source venv/bin/activate
+pip install -e .
+```
+
+### Docker Permission Denied
+
+Add user to docker group:
+
+```bash
+sudo usermod -aG docker $USER
+```
+
+Log out and back in, then:
+
+```bash
+docker ps  # Should work without sudo
+```
+
+## Uninstallation
+
+### Remove Package
+
+```bash
+make uninstall
+# or
+pip3 uninstall claude-vision-auto
+```
+
+### Remove Ollama
+
+```bash
+docker stop ollama
+docker rm ollama
+docker rmi ollama/ollama
+```
+
+### Remove System Dependencies
+
+```bash
+sudo apt-get remove scrot
+```
+
+## Next Steps
+
+After successful installation:
+
+1. Read [USAGE.md](USAGE.md) for usage examples
+2. Configure environment variables if needed
+3. Test with a simple Claude Code command
+
+## Getting Help
+
+If you encounter issues not covered here:
+
+1. Check the main [README.md](../README.md)
+2. Enable debug mode: `DEBUG=true claude-vision`
+3. Check logs: `~/.cache/claude-vision-auto/`
+4. Report issues: https://git.openharbor.io/svrnty/claude-vision-auto/issues
@@ -0,0 +1,439 @@
+# Usage Guide
+
+Comprehensive usage guide for Claude Vision Auto.
+
+## Table of Contents
+
+1. [Basic Usage](#basic-usage)
+2. [Configuration](#configuration)
+3. [Common Scenarios](#common-scenarios)
+4. [Advanced Usage](#advanced-usage)
+5. [Tips and Best Practices](#tips-and-best-practices)
+
+## Basic Usage
+
+### Starting Interactive Session
+
+Simply replace `claude` with `claude-vision`:
+
+```bash
+claude-vision
+```
+
+Expected output:
+
+```
+[Claude Vision Auto] Testing Ollama connection...
+[Claude Vision Auto] Connected to Ollama
+[Claude Vision Auto] Using model: minicpm-v:latest
+[Claude Vision Auto] Idle threshold: 3.0s
+
+ ▐▛███▜▌   Claude Code v2.0.26
+▝▜█████▛▘  Sonnet 4.5 · Claude Max
+  ▘▘ ▝▝    /home/username/project
+
+>
+```
+
+### With Initial Prompt
+
+```bash
+claude-vision "create a test.md file in /tmp"
+```
+
+### How Auto-Approval Works
+
+When Claude asks for approval:
+
+```
+╭───────────────────────────────────────────╮
+│ Create file                               │
+│ ╭───────────────────────────────────────╮ │
+│ │ /tmp/test.md                          │ │
+│ │                                       │ │
+│ │ # Test File                           │ │
+│ │                                       │ │
+│ │ This is a test.                       │ │
+│ ╰───────────────────────────────────────╯ │
+│ Do you want to create test.md?            │
+│ ❯ 1. Yes                                  │
+│   2. Yes, allow all edits                 │
+│   3. No                                   │
+╰───────────────────────────────────────────╯
+```
+
+Claude Vision Auto will:
+1. Detect idle state (3 seconds)
+2. Take screenshot
+3. Analyze with vision model
+4. Automatically select option 1
+
+Output:
+
+```
+[Vision] Analyzing prompt...
+[Vision] Response: 1
+[Vision] Response sent
+```
+
+## Configuration
+
+### Environment Variables
+
+Set before running `claude-vision`:
+
+```bash
+# Example: Custom configuration
+export OLLAMA_URL="http://192.168.1.100:11434/api/generate"
+export VISION_MODEL="llama3.2-vision:latest"
+export IDLE_THRESHOLD="5.0"
+export RESPONSE_DELAY="2.0"
+export DEBUG="true"
+
+claude-vision
+```
+
+### Persistent Configuration
+
+Add to `~/.bashrc` or `~/.zshrc`:
+
+```bash
+# Claude Vision Auto Configuration
+export CLAUDE_VISION_OLLAMA_URL="http://localhost:11434/api/generate"
+export CLAUDE_VISION_MODEL="minicpm-v:latest"
+export CLAUDE_VISION_IDLE_THRESHOLD="3.0"
+export CLAUDE_VISION_RESPONSE_DELAY="1.0"
+
+# Alias for convenience
+alias cv="claude-vision"
+```
+
+Reload:
+
+```bash
+source ~/.bashrc
+```
+
+### Configuration Options Reference
+
+| Variable | Type | Default | Description |
+|----------|------|---------|-------------|
+| `OLLAMA_URL` | URL | `http://localhost:11434/api/generate` | Ollama API endpoint |
+| `VISION_MODEL` | String | `minicpm-v:latest` | Vision model name |
+| `IDLE_THRESHOLD` | Float | `3.0` | Seconds to wait before screenshot |
+| `RESPONSE_DELAY` | Float | `1.0` | Seconds to wait before responding |
+| `OUTPUT_BUFFER_SIZE` | Integer | `4096` | Buffer size in bytes |
+| `SCREENSHOT_TIMEOUT` | Integer | `5` | Screenshot timeout (seconds) |
+| `VISION_TIMEOUT` | Integer | `30` | Vision analysis timeout (seconds) |
+| `DEBUG` | Boolean | `false` | Enable debug logging |
+
+## Common Scenarios
+
+### Scenario 1: File Creation
+
+```bash
+claude-vision "create a new Python script called hello.py"
+```
+
+Auto-approves file creation.
+
+### Scenario 2: File Editing
+
+```bash
+claude-vision "add error handling to main.py"
+```
+
+Auto-approves file edits.
+
+### Scenario 3: Multiple Operations
+
+```bash
+claude-vision "refactor the authentication module and add tests"
+```
+
+Auto-approves each operation sequentially.
+
+### Scenario 4: Longer Wait Time
+
+For slower systems or models:
+
+```bash
+IDLE_THRESHOLD="5.0" RESPONSE_DELAY="2.0" claude-vision
+```
+
+### Scenario 5: Different Vision Model
+
+```bash
+VISION_MODEL="llama3.2-vision:latest" claude-vision
+```
+
+### Scenario 6: Remote Ollama
+
+```bash
+OLLAMA_URL="http://192.168.1.100:11434/api/generate" claude-vision
+```
+
+### Scenario 7: Debug Mode
+
+When troubleshooting:
+
+```bash
+DEBUG=true claude-vision "test command"
+```
+
+Output will include:
+
+```
+[DEBUG] Screenshot saved to /home/user/.cache/claude-vision-auto/screenshot_1234567890.png
+[DEBUG] Sending to Ollama: http://localhost:11434/api/generate
+[DEBUG] Model: minicpm-v:latest
+[DEBUG] Vision model response: 1
+```
+
+## Advanced Usage
+
+### Using Different Models
+
+#### MiniCPM-V (Default)
+
+Best for structured responses:
+
+```bash
+VISION_MODEL="minicpm-v:latest" claude-vision
+```
+
+#### Llama 3.2 Vision
+
+Good alternative:
+
+```bash
+VISION_MODEL="llama3.2-vision:latest" claude-vision
+```
+
+#### LLaVA
+
+Lightweight option:
+
+```bash
+VISION_MODEL="llava:latest" claude-vision
+```
+
+### Shell Aliases
+
+Add to `~/.bashrc`:
+
+```bash
+# Quick aliases
+alias cv="claude-vision"
+alias cvd="DEBUG=true claude-vision"  # Debug mode
+alias cvs="IDLE_THRESHOLD=5.0 claude-vision"  # Slower
+
+# Model-specific
+alias cv-mini="VISION_MODEL=minicpm-v:latest claude-vision"
+alias cv-llama="VISION_MODEL=llama3.2-vision:latest claude-vision"
+```
+
+### Integration with Scripts
+
+```bash
+#!/bin/bash
+# auto-refactor.sh
+
+export DEBUG="false"
+export IDLE_THRESHOLD="3.0"
+
+claude-vision "refactor all JavaScript files to use modern ES6 syntax"
+```
+
+### Conditional Auto-Approval
+
+Create wrapper script:
+
+```bash
+#!/bin/bash
+# conditional-claude.sh
+
+if [ "$AUTO_APPROVE" = "true" ]; then
+    claude-vision "$@"
+else
+    claude "$@"
+fi
+```
+
+Usage:
+
+```bash
+AUTO_APPROVE=true ./conditional-claude.sh "create files"
+```
+
+### Multiple Terminal Support
+
+Each terminal needs separate instance:
+
+```bash
+# Terminal 1
+claude-vision  # Project A
+
+# Terminal 2
+claude-vision  # Project B
+```
+
+## Tips and Best Practices
+
+### 1. Adjust Idle Threshold
+
+- **Fast system**: `IDLE_THRESHOLD=2.0`
+- **Slow system**: `IDLE_THRESHOLD=5.0`
+- **Remote Ollama**: `IDLE_THRESHOLD=4.0`
+
+### 2. Model Selection
+
+- **Accuracy**: MiniCPM-V > Llama 3.2 Vision > LLaVA
+- **Speed**: LLaVA > MiniCPM-V > Llama 3.2 Vision
+- **Size**: LLaVA (4.5GB) < MiniCPM-V (5.5GB) < Llama 3.2 (7.8GB)
+
+### 3. Debug When Needed
+
+Enable debug mode if responses are incorrect:
+
+```bash
+DEBUG=true claude-vision
+```
+
+Check vision model output to see what it's detecting.
+
+### 4. Screenshot Quality
+
+Ensure terminal is visible and not obscured by other windows.
+
+### 5. Performance Optimization
+
+For faster responses:
+
+```bash
+# Pre-warm the model
+docker exec ollama ollama run minicpm-v:latest "test" < /dev/null
+
+# Then use normally
+claude-vision
+```
+
+### 6. Clean Up Old Screenshots
+
+Screenshots are auto-cleaned after 1 hour, but manual cleanup:
+
+```bash
+rm -rf ~/.cache/claude-vision-auto/*.png
+```
+
+### 7. Check Model Status
+
+Before starting long sessions:
+
+```bash
+# Verify Ollama is responsive
+curl -s http://localhost:11434/api/tags | python3 -m json.tool
+
+# Check model is loaded
+docker exec ollama ollama ps
+```
+
+## Troubleshooting During Use
+
+### Issue: Not Auto-Responding
+
+**Symptoms**: Vision analyzes but doesn't respond
+
+**Solutions**:
+
+1. Check if "WAIT" is returned:
+   ```bash
+   DEBUG=true claude-vision
+   # Look for: [DEBUG] Vision model response: WAIT
+   ```
+
+2. Increase idle threshold:
+   ```bash
+   IDLE_THRESHOLD="5.0" claude-vision
+   ```
+
+3. Try different model:
+   ```bash
+   VISION_MODEL="llama3.2-vision:latest" claude-vision
+   ```
+
+### Issue: Wrong Response
+
+**Symptoms**: Selects wrong option or types wrong answer
+
+**Solutions**:
+
+1. Enable debug to see what vision model sees:
+   ```bash
+   DEBUG=true claude-vision
+   ```
+
+2. Check screenshot manually:
+   ```bash
+   # Don't auto-clean
+   ls ~/.cache/claude-vision-auto/
+   ```
+
+3. Adjust response delay:
+   ```bash
+   RESPONSE_DELAY="2.0" claude-vision
+   ```
+
+### Issue: Slow Response
+
+**Symptoms**: Takes too long to respond
+
+**Solutions**:
+
+1. Use faster model:
+   ```bash
+   VISION_MODEL="llava:latest" claude-vision
+   ```
+
+2. Reduce vision timeout:
+   ```bash
+   VISION_TIMEOUT="15" claude-vision
+   ```
+
+3. Check Ollama performance:
+   ```bash
+   docker stats ollama
+   ```
+
+### Issue: Too Aggressive
+
+**Symptoms**: Responds to non-approval prompts
+
+**Solutions**:
+
+1. Increase idle threshold:
+   ```bash
+   IDLE_THRESHOLD="5.0" claude-vision
+   ```
+
+2. Check approval keywords in config
+3. Use manual mode for sensitive operations:
+   ```bash
+   claude  # No auto-approval
+   ```
+
+## Getting Help
+
+If issues persist:
+
+1. Check logs with `DEBUG=true`
+2. Review documentation
+3. Report issues with debug output
+4. Include screenshot samples
+
+## Next Steps
+
+- Explore [examples/](../examples/) directory
+- Customize configuration for your workflow
+- Create shell aliases for common tasks
+- Integrate with CI/CD pipelines (if applicable)