Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model. Features: - Automatic detection and response to approval prompts - Screenshot capture and vision analysis via Ollama - Support for multiple screenshot tools (scrot, gnome-screenshot, etc.) - Configurable timing and behavior - Debug mode for troubleshooting - Comprehensive documentation Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>
440 lines
8.9 KiB
Markdown
440 lines
8.9 KiB
Markdown
# Usage Guide
|
||
|
||
Comprehensive usage guide for Claude Vision Auto.
|
||
|
||
## Table of Contents
|
||
|
||
1. [Basic Usage](#basic-usage)
|
||
2. [Configuration](#configuration)
|
||
3. [Common Scenarios](#common-scenarios)
|
||
4. [Advanced Usage](#advanced-usage)
|
||
5. [Tips and Best Practices](#tips-and-best-practices)
|
||
|
||
## Basic Usage
|
||
|
||
### Starting Interactive Session
|
||
|
||
Simply replace `claude` with `claude-vision`:
|
||
|
||
```bash
|
||
claude-vision
|
||
```
|
||
|
||
Expected output:
|
||
|
||
```
|
||
[Claude Vision Auto] Testing Ollama connection...
|
||
[Claude Vision Auto] Connected to Ollama
|
||
[Claude Vision Auto] Using model: minicpm-v:latest
|
||
[Claude Vision Auto] Idle threshold: 3.0s
|
||
|
||
▐▛███▜▌ Claude Code v2.0.26
|
||
▝▜█████▛▘ Sonnet 4.5 · Claude Max
|
||
▘▘ ▝▝ /home/username/project
|
||
|
||
>
|
||
```
|
||
|
||
### With Initial Prompt
|
||
|
||
```bash
|
||
claude-vision "create a test.md file in /tmp"
|
||
```
|
||
|
||
### How Auto-Approval Works
|
||
|
||
When Claude asks for approval:
|
||
|
||
```
|
||
╭───────────────────────────────────────────╮
|
||
│ Create file │
|
||
│ ╭───────────────────────────────────────╮ │
|
||
│ │ /tmp/test.md │ │
|
||
│ │ │ │
|
||
│ │ # Test File │ │
|
||
│ │ │ │
|
||
│ │ This is a test. │ │
|
||
│ ╰───────────────────────────────────────╯ │
|
||
│ Do you want to create test.md? │
|
||
│ ❯ 1. Yes │
|
||
│ 2. Yes, allow all edits │
|
||
│ 3. No │
|
||
╰───────────────────────────────────────────╯
|
||
```
|
||
|
||
Claude Vision Auto will:
|
||
1. Detect idle state (3 seconds)
|
||
2. Take screenshot
|
||
3. Analyze with vision model
|
||
4. Automatically select option 1
|
||
|
||
Output:
|
||
|
||
```
|
||
[Vision] Analyzing prompt...
|
||
[Vision] Response: 1
|
||
[Vision] Response sent
|
||
```
|
||
|
||
## Configuration
|
||
|
||
### Environment Variables
|
||
|
||
Set before running `claude-vision`:
|
||
|
||
```bash
|
||
# Example: Custom configuration
|
||
export OLLAMA_URL="http://192.168.1.100:11434/api/generate"
|
||
export VISION_MODEL="llama3.2-vision:latest"
|
||
export IDLE_THRESHOLD="5.0"
|
||
export RESPONSE_DELAY="2.0"
|
||
export DEBUG="true"
|
||
|
||
claude-vision
|
||
```
|
||
|
||
### Persistent Configuration
|
||
|
||
Add to `~/.bashrc` or `~/.zshrc`:
|
||
|
||
```bash
|
||
# Claude Vision Auto Configuration
|
||
export CLAUDE_VISION_OLLAMA_URL="http://localhost:11434/api/generate"
|
||
export CLAUDE_VISION_MODEL="minicpm-v:latest"
|
||
export CLAUDE_VISION_IDLE_THRESHOLD="3.0"
|
||
export CLAUDE_VISION_RESPONSE_DELAY="1.0"
|
||
|
||
# Alias for convenience
|
||
alias cv="claude-vision"
|
||
```
|
||
|
||
Reload:
|
||
|
||
```bash
|
||
source ~/.bashrc
|
||
```
|
||
|
||
### Configuration Options Reference
|
||
|
||
| Variable | Type | Default | Description |
|
||
|----------|------|---------|-------------|
|
||
| `OLLAMA_URL` | URL | `http://localhost:11434/api/generate` | Ollama API endpoint |
|
||
| `VISION_MODEL` | String | `minicpm-v:latest` | Vision model name |
|
||
| `IDLE_THRESHOLD` | Float | `3.0` | Seconds to wait before screenshot |
|
||
| `RESPONSE_DELAY` | Float | `1.0` | Seconds to wait before responding |
|
||
| `OUTPUT_BUFFER_SIZE` | Integer | `4096` | Buffer size in bytes |
|
||
| `SCREENSHOT_TIMEOUT` | Integer | `5` | Screenshot timeout (seconds) |
|
||
| `VISION_TIMEOUT` | Integer | `30` | Vision analysis timeout (seconds) |
|
||
| `DEBUG` | Boolean | `false` | Enable debug logging |
|
||
|
||
## Common Scenarios
|
||
|
||
### Scenario 1: File Creation
|
||
|
||
```bash
|
||
claude-vision "create a new Python script called hello.py"
|
||
```
|
||
|
||
Auto-approves file creation.
|
||
|
||
### Scenario 2: File Editing
|
||
|
||
```bash
|
||
claude-vision "add error handling to main.py"
|
||
```
|
||
|
||
Auto-approves file edits.
|
||
|
||
### Scenario 3: Multiple Operations
|
||
|
||
```bash
|
||
claude-vision "refactor the authentication module and add tests"
|
||
```
|
||
|
||
Auto-approves each operation sequentially.
|
||
|
||
### Scenario 4: Longer Wait Time
|
||
|
||
For slower systems or models:
|
||
|
||
```bash
|
||
IDLE_THRESHOLD="5.0" RESPONSE_DELAY="2.0" claude-vision
|
||
```
|
||
|
||
### Scenario 5: Different Vision Model
|
||
|
||
```bash
|
||
VISION_MODEL="llama3.2-vision:latest" claude-vision
|
||
```
|
||
|
||
### Scenario 6: Remote Ollama
|
||
|
||
```bash
|
||
OLLAMA_URL="http://192.168.1.100:11434/api/generate" claude-vision
|
||
```
|
||
|
||
### Scenario 7: Debug Mode
|
||
|
||
When troubleshooting:
|
||
|
||
```bash
|
||
DEBUG=true claude-vision "test command"
|
||
```
|
||
|
||
Output will include:
|
||
|
||
```
|
||
[DEBUG] Screenshot saved to /home/user/.cache/claude-vision-auto/screenshot_1234567890.png
|
||
[DEBUG] Sending to Ollama: http://localhost:11434/api/generate
|
||
[DEBUG] Model: minicpm-v:latest
|
||
[DEBUG] Vision model response: 1
|
||
```
|
||
|
||
## Advanced Usage
|
||
|
||
### Using Different Models
|
||
|
||
#### MiniCPM-V (Default)
|
||
|
||
Best for structured responses:
|
||
|
||
```bash
|
||
VISION_MODEL="minicpm-v:latest" claude-vision
|
||
```
|
||
|
||
#### Llama 3.2 Vision
|
||
|
||
Good alternative:
|
||
|
||
```bash
|
||
VISION_MODEL="llama3.2-vision:latest" claude-vision
|
||
```
|
||
|
||
#### LLaVA
|
||
|
||
Lightweight option:
|
||
|
||
```bash
|
||
VISION_MODEL="llava:latest" claude-vision
|
||
```
|
||
|
||
### Shell Aliases
|
||
|
||
Add to `~/.bashrc`:
|
||
|
||
```bash
|
||
# Quick aliases
|
||
alias cv="claude-vision"
|
||
alias cvd="DEBUG=true claude-vision" # Debug mode
|
||
alias cvs="IDLE_THRESHOLD=5.0 claude-vision" # Slower
|
||
|
||
# Model-specific
|
||
alias cv-mini="VISION_MODEL=minicpm-v:latest claude-vision"
|
||
alias cv-llama="VISION_MODEL=llama3.2-vision:latest claude-vision"
|
||
```
|
||
|
||
### Integration with Scripts
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# auto-refactor.sh
|
||
|
||
export DEBUG="false"
|
||
export IDLE_THRESHOLD="3.0"
|
||
|
||
claude-vision "refactor all JavaScript files to use modern ES6 syntax"
|
||
```
|
||
|
||
### Conditional Auto-Approval
|
||
|
||
Create wrapper script:
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# conditional-claude.sh
|
||
|
||
if [ "$AUTO_APPROVE" = "true" ]; then
|
||
claude-vision "$@"
|
||
else
|
||
claude "$@"
|
||
fi
|
||
```
|
||
|
||
Usage:
|
||
|
||
```bash
|
||
AUTO_APPROVE=true ./conditional-claude.sh "create files"
|
||
```
|
||
|
||
### Multiple Terminal Support
|
||
|
||
Each terminal needs separate instance:
|
||
|
||
```bash
|
||
# Terminal 1
|
||
claude-vision # Project A
|
||
|
||
# Terminal 2
|
||
claude-vision # Project B
|
||
```
|
||
|
||
## Tips and Best Practices
|
||
|
||
### 1. Adjust Idle Threshold
|
||
|
||
- **Fast system**: `IDLE_THRESHOLD=2.0`
|
||
- **Slow system**: `IDLE_THRESHOLD=5.0`
|
||
- **Remote Ollama**: `IDLE_THRESHOLD=4.0`
|
||
|
||
### 2. Model Selection
|
||
|
||
- **Accuracy**: MiniCPM-V > Llama 3.2 Vision > LLaVA
|
||
- **Speed**: LLaVA > MiniCPM-V > Llama 3.2 Vision
|
||
- **Size**: LLaVA (4.5GB) < MiniCPM-V (5.5GB) < Llama 3.2 (7.8GB)
|
||
|
||
### 3. Debug When Needed
|
||
|
||
Enable debug mode if responses are incorrect:
|
||
|
||
```bash
|
||
DEBUG=true claude-vision
|
||
```
|
||
|
||
Check vision model output to see what it's detecting.
|
||
|
||
### 4. Screenshot Quality
|
||
|
||
Ensure terminal is visible and not obscured by other windows.
|
||
|
||
### 5. Performance Optimization
|
||
|
||
For faster responses:
|
||
|
||
```bash
|
||
# Pre-warm the model
|
||
docker exec ollama ollama run minicpm-v:latest "test" < /dev/null
|
||
|
||
# Then use normally
|
||
claude-vision
|
||
```
|
||
|
||
### 6. Clean Up Old Screenshots
|
||
|
||
Screenshots are auto-cleaned after 1 hour, but manual cleanup:
|
||
|
||
```bash
|
||
rm -rf ~/.cache/claude-vision-auto/*.png
|
||
```
|
||
|
||
### 7. Check Model Status
|
||
|
||
Before starting long sessions:
|
||
|
||
```bash
|
||
# Verify Ollama is responsive
|
||
curl -s http://localhost:11434/api/tags | python3 -m json.tool
|
||
|
||
# Check model is loaded
|
||
docker exec ollama ollama ps
|
||
```
|
||
|
||
## Troubleshooting During Use
|
||
|
||
### Issue: Not Auto-Responding
|
||
|
||
**Symptoms**: Vision analyzes but doesn't respond
|
||
|
||
**Solutions**:
|
||
|
||
1. Check if "WAIT" is returned:
|
||
```bash
|
||
DEBUG=true claude-vision
|
||
# Look for: [DEBUG] Vision model response: WAIT
|
||
```
|
||
|
||
2. Increase idle threshold:
|
||
```bash
|
||
IDLE_THRESHOLD="5.0" claude-vision
|
||
```
|
||
|
||
3. Try different model:
|
||
```bash
|
||
VISION_MODEL="llama3.2-vision:latest" claude-vision
|
||
```
|
||
|
||
### Issue: Wrong Response
|
||
|
||
**Symptoms**: Selects wrong option or types wrong answer
|
||
|
||
**Solutions**:
|
||
|
||
1. Enable debug to see what vision model sees:
|
||
```bash
|
||
DEBUG=true claude-vision
|
||
```
|
||
|
||
2. Check screenshot manually:
|
||
```bash
|
||
# Don't auto-clean
|
||
ls ~/.cache/claude-vision-auto/
|
||
```
|
||
|
||
3. Adjust response delay:
|
||
```bash
|
||
RESPONSE_DELAY="2.0" claude-vision
|
||
```
|
||
|
||
### Issue: Slow Response
|
||
|
||
**Symptoms**: Takes too long to respond
|
||
|
||
**Solutions**:
|
||
|
||
1. Use faster model:
|
||
```bash
|
||
VISION_MODEL="llava:latest" claude-vision
|
||
```
|
||
|
||
2. Reduce vision timeout:
|
||
```bash
|
||
VISION_TIMEOUT="15" claude-vision
|
||
```
|
||
|
||
3. Check Ollama performance:
|
||
```bash
|
||
docker stats ollama
|
||
```
|
||
|
||
### Issue: Too Aggressive
|
||
|
||
**Symptoms**: Responds to non-approval prompts
|
||
|
||
**Solutions**:
|
||
|
||
1. Increase idle threshold:
|
||
```bash
|
||
IDLE_THRESHOLD="5.0" claude-vision
|
||
```
|
||
|
||
2. Check approval keywords in config
|
||
3. Use manual mode for sensitive operations:
|
||
```bash
|
||
claude # No auto-approval
|
||
```
|
||
|
||
## Getting Help
|
||
|
||
If issues persist:
|
||
|
||
1. Check logs with `DEBUG=true`
|
||
2. Review documentation
|
||
3. Report issues with debug output
|
||
4. Include screenshot samples
|
||
|
||
## Next Steps
|
||
|
||
- Explore [examples/](../examples/) directory
|
||
- Customize configuration for your workflow
|
||
- Create shell aliases for common tasks
|
||
- Integrate with CI/CD pipelines (if applicable)
|