Vision-module-auto/docs/USAGE.md
Svrnty 41cecca0e2 Initial release of Claude Vision Auto v1.0.0
Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model.

Features:
- Automatic detection and response to approval prompts
- Screenshot capture and vision analysis via Ollama
- Support for multiple screenshot tools (scrot, gnome-screenshot, etc.)
- Configurable timing and behavior
- Debug mode for troubleshooting
- Comprehensive documentation

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>
2025-10-29 10:09:01 -04:00

440 lines
8.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Usage Guide
Comprehensive usage guide for Claude Vision Auto.
## Table of Contents
1. [Basic Usage](#basic-usage)
2. [Configuration](#configuration)
3. [Common Scenarios](#common-scenarios)
4. [Advanced Usage](#advanced-usage)
5. [Tips and Best Practices](#tips-and-best-practices)
## Basic Usage
### Starting Interactive Session
Simply replace `claude` with `claude-vision`:
```bash
claude-vision
```
Expected output:
```
[Claude Vision Auto] Testing Ollama connection...
[Claude Vision Auto] Connected to Ollama
[Claude Vision Auto] Using model: minicpm-v:latest
[Claude Vision Auto] Idle threshold: 3.0s
▐▛███▜▌ Claude Code v2.0.26
▝▜█████▛▘ Sonnet 4.5 · Claude Max
▘▘ ▝▝ /home/username/project
>
```
### With Initial Prompt
```bash
claude-vision "create a test.md file in /tmp"
```
### How Auto-Approval Works
When Claude asks for approval:
```
╭───────────────────────────────────────────╮
│ Create file │
│ ╭───────────────────────────────────────╮ │
│ │ /tmp/test.md │ │
│ │ │ │
│ │ # Test File │ │
│ │ │ │
│ │ This is a test. │ │
│ ╰───────────────────────────────────────╯ │
│ Do you want to create test.md? │
1. Yes │
│ 2. Yes, allow all edits │
│ 3. No │
╰───────────────────────────────────────────╯
```
Claude Vision Auto will:
1. Detect idle state (3 seconds)
2. Take screenshot
3. Analyze with vision model
4. Automatically select option 1
Output:
```
[Vision] Analyzing prompt...
[Vision] Response: 1
[Vision] Response sent
```
## Configuration
### Environment Variables
Set before running `claude-vision`:
```bash
# Example: Custom configuration
export OLLAMA_URL="http://192.168.1.100:11434/api/generate"
export VISION_MODEL="llama3.2-vision:latest"
export IDLE_THRESHOLD="5.0"
export RESPONSE_DELAY="2.0"
export DEBUG="true"
claude-vision
```
### Persistent Configuration
Add to `~/.bashrc` or `~/.zshrc`:
```bash
# Claude Vision Auto Configuration
export CLAUDE_VISION_OLLAMA_URL="http://localhost:11434/api/generate"
export CLAUDE_VISION_MODEL="minicpm-v:latest"
export CLAUDE_VISION_IDLE_THRESHOLD="3.0"
export CLAUDE_VISION_RESPONSE_DELAY="1.0"
# Alias for convenience
alias cv="claude-vision"
```
Reload:
```bash
source ~/.bashrc
```
### Configuration Options Reference
| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| `OLLAMA_URL` | URL | `http://localhost:11434/api/generate` | Ollama API endpoint |
| `VISION_MODEL` | String | `minicpm-v:latest` | Vision model name |
| `IDLE_THRESHOLD` | Float | `3.0` | Seconds to wait before screenshot |
| `RESPONSE_DELAY` | Float | `1.0` | Seconds to wait before responding |
| `OUTPUT_BUFFER_SIZE` | Integer | `4096` | Buffer size in bytes |
| `SCREENSHOT_TIMEOUT` | Integer | `5` | Screenshot timeout (seconds) |
| `VISION_TIMEOUT` | Integer | `30` | Vision analysis timeout (seconds) |
| `DEBUG` | Boolean | `false` | Enable debug logging |
## Common Scenarios
### Scenario 1: File Creation
```bash
claude-vision "create a new Python script called hello.py"
```
Auto-approves file creation.
### Scenario 2: File Editing
```bash
claude-vision "add error handling to main.py"
```
Auto-approves file edits.
### Scenario 3: Multiple Operations
```bash
claude-vision "refactor the authentication module and add tests"
```
Auto-approves each operation sequentially.
### Scenario 4: Longer Wait Time
For slower systems or models:
```bash
IDLE_THRESHOLD="5.0" RESPONSE_DELAY="2.0" claude-vision
```
### Scenario 5: Different Vision Model
```bash
VISION_MODEL="llama3.2-vision:latest" claude-vision
```
### Scenario 6: Remote Ollama
```bash
OLLAMA_URL="http://192.168.1.100:11434/api/generate" claude-vision
```
### Scenario 7: Debug Mode
When troubleshooting:
```bash
DEBUG=true claude-vision "test command"
```
Output will include:
```
[DEBUG] Screenshot saved to /home/user/.cache/claude-vision-auto/screenshot_1234567890.png
[DEBUG] Sending to Ollama: http://localhost:11434/api/generate
[DEBUG] Model: minicpm-v:latest
[DEBUG] Vision model response: 1
```
## Advanced Usage
### Using Different Models
#### MiniCPM-V (Default)
Best for structured responses:
```bash
VISION_MODEL="minicpm-v:latest" claude-vision
```
#### Llama 3.2 Vision
Good alternative:
```bash
VISION_MODEL="llama3.2-vision:latest" claude-vision
```
#### LLaVA
Lightweight option:
```bash
VISION_MODEL="llava:latest" claude-vision
```
### Shell Aliases
Add to `~/.bashrc`:
```bash
# Quick aliases
alias cv="claude-vision"
alias cvd="DEBUG=true claude-vision" # Debug mode
alias cvs="IDLE_THRESHOLD=5.0 claude-vision" # Slower
# Model-specific
alias cv-mini="VISION_MODEL=minicpm-v:latest claude-vision"
alias cv-llama="VISION_MODEL=llama3.2-vision:latest claude-vision"
```
### Integration with Scripts
```bash
#!/bin/bash
# auto-refactor.sh
export DEBUG="false"
export IDLE_THRESHOLD="3.0"
claude-vision "refactor all JavaScript files to use modern ES6 syntax"
```
### Conditional Auto-Approval
Create wrapper script:
```bash
#!/bin/bash
# conditional-claude.sh
if [ "$AUTO_APPROVE" = "true" ]; then
claude-vision "$@"
else
claude "$@"
fi
```
Usage:
```bash
AUTO_APPROVE=true ./conditional-claude.sh "create files"
```
### Multiple Terminal Support
Each terminal needs separate instance:
```bash
# Terminal 1
claude-vision # Project A
# Terminal 2
claude-vision # Project B
```
## Tips and Best Practices
### 1. Adjust Idle Threshold
- **Fast system**: `IDLE_THRESHOLD=2.0`
- **Slow system**: `IDLE_THRESHOLD=5.0`
- **Remote Ollama**: `IDLE_THRESHOLD=4.0`
### 2. Model Selection
- **Accuracy**: MiniCPM-V > Llama 3.2 Vision > LLaVA
- **Speed**: LLaVA > MiniCPM-V > Llama 3.2 Vision
- **Size**: LLaVA (4.5GB) < MiniCPM-V (5.5GB) < Llama 3.2 (7.8GB)
### 3. Debug When Needed
Enable debug mode if responses are incorrect:
```bash
DEBUG=true claude-vision
```
Check vision model output to see what it's detecting.
### 4. Screenshot Quality
Ensure terminal is visible and not obscured by other windows.
### 5. Performance Optimization
For faster responses:
```bash
# Pre-warm the model
docker exec ollama ollama run minicpm-v:latest "test" < /dev/null
# Then use normally
claude-vision
```
### 6. Clean Up Old Screenshots
Screenshots are auto-cleaned after 1 hour, but manual cleanup:
```bash
rm -rf ~/.cache/claude-vision-auto/*.png
```
### 7. Check Model Status
Before starting long sessions:
```bash
# Verify Ollama is responsive
curl -s http://localhost:11434/api/tags | python3 -m json.tool
# Check model is loaded
docker exec ollama ollama ps
```
## Troubleshooting During Use
### Issue: Not Auto-Responding
**Symptoms**: Vision analyzes but doesn't respond
**Solutions**:
1. Check if "WAIT" is returned:
```bash
DEBUG=true claude-vision
# Look for: [DEBUG] Vision model response: WAIT
```
2. Increase idle threshold:
```bash
IDLE_THRESHOLD="5.0" claude-vision
```
3. Try different model:
```bash
VISION_MODEL="llama3.2-vision:latest" claude-vision
```
### Issue: Wrong Response
**Symptoms**: Selects wrong option or types wrong answer
**Solutions**:
1. Enable debug to see what vision model sees:
```bash
DEBUG=true claude-vision
```
2. Check screenshot manually:
```bash
# Don't auto-clean
ls ~/.cache/claude-vision-auto/
```
3. Adjust response delay:
```bash
RESPONSE_DELAY="2.0" claude-vision
```
### Issue: Slow Response
**Symptoms**: Takes too long to respond
**Solutions**:
1. Use faster model:
```bash
VISION_MODEL="llava:latest" claude-vision
```
2. Reduce vision timeout:
```bash
VISION_TIMEOUT="15" claude-vision
```
3. Check Ollama performance:
```bash
docker stats ollama
```
### Issue: Too Aggressive
**Symptoms**: Responds to non-approval prompts
**Solutions**:
1. Increase idle threshold:
```bash
IDLE_THRESHOLD="5.0" claude-vision
```
2. Check approval keywords in config
3. Use manual mode for sensitive operations:
```bash
claude # No auto-approval
```
## Getting Help
If issues persist:
1. Check logs with `DEBUG=true`
2. Review documentation
3. Report issues with debug output
4. Include screenshot samples
## Next Steps
- Explore [examples/](../examples/) directory
- Customize configuration for your workflow
- Create shell aliases for common tasks
- Integrate with CI/CD pipelines (if applicable)