Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model. Features: - Automatic detection and response to approval prompts - Screenshot capture and vision analysis via Ollama - Support for multiple screenshot tools (scrot, gnome-screenshot, etc.) - Configurable timing and behavior - Debug mode for troubleshooting - Comprehensive documentation Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>
8.9 KiB
Usage Guide
Comprehensive usage guide for Claude Vision Auto.
Table of Contents
Basic Usage
Starting Interactive Session
Simply replace claude with claude-vision:
claude-vision
Expected output:
[Claude Vision Auto] Testing Ollama connection...
[Claude Vision Auto] Connected to Ollama
[Claude Vision Auto] Using model: minicpm-v:latest
[Claude Vision Auto] Idle threshold: 3.0s
▐▛███▜▌ Claude Code v2.0.26
▝▜█████▛▘ Sonnet 4.5 · Claude Max
▘▘ ▝▝ /home/username/project
>
With Initial Prompt
claude-vision "create a test.md file in /tmp"
How Auto-Approval Works
When Claude asks for approval:
╭───────────────────────────────────────────╮
│ Create file │
│ ╭───────────────────────────────────────╮ │
│ │ /tmp/test.md │ │
│ │ │ │
│ │ # Test File │ │
│ │ │ │
│ │ This is a test. │ │
│ ╰───────────────────────────────────────╯ │
│ Do you want to create test.md? │
│ ❯ 1. Yes │
│ 2. Yes, allow all edits │
│ 3. No │
╰───────────────────────────────────────────╯
Claude Vision Auto will:
- Detect idle state (3 seconds)
- Take screenshot
- Analyze with vision model
- Automatically select option 1
Output:
[Vision] Analyzing prompt...
[Vision] Response: 1
[Vision] Response sent
Configuration
Environment Variables
Set before running claude-vision:
# Example: Custom configuration
export OLLAMA_URL="http://192.168.1.100:11434/api/generate"
export VISION_MODEL="llama3.2-vision:latest"
export IDLE_THRESHOLD="5.0"
export RESPONSE_DELAY="2.0"
export DEBUG="true"
claude-vision
Persistent Configuration
Add to ~/.bashrc or ~/.zshrc:
# Claude Vision Auto Configuration
export CLAUDE_VISION_OLLAMA_URL="http://localhost:11434/api/generate"
export CLAUDE_VISION_MODEL="minicpm-v:latest"
export CLAUDE_VISION_IDLE_THRESHOLD="3.0"
export CLAUDE_VISION_RESPONSE_DELAY="1.0"
# Alias for convenience
alias cv="claude-vision"
Reload:
source ~/.bashrc
Configuration Options Reference
| Variable | Type | Default | Description |
|---|---|---|---|
OLLAMA_URL |
URL | http://localhost:11434/api/generate |
Ollama API endpoint |
VISION_MODEL |
String | minicpm-v:latest |
Vision model name |
IDLE_THRESHOLD |
Float | 3.0 |
Seconds to wait before screenshot |
RESPONSE_DELAY |
Float | 1.0 |
Seconds to wait before responding |
OUTPUT_BUFFER_SIZE |
Integer | 4096 |
Buffer size in bytes |
SCREENSHOT_TIMEOUT |
Integer | 5 |
Screenshot timeout (seconds) |
VISION_TIMEOUT |
Integer | 30 |
Vision analysis timeout (seconds) |
DEBUG |
Boolean | false |
Enable debug logging |
Common Scenarios
Scenario 1: File Creation
claude-vision "create a new Python script called hello.py"
Auto-approves file creation.
Scenario 2: File Editing
claude-vision "add error handling to main.py"
Auto-approves file edits.
Scenario 3: Multiple Operations
claude-vision "refactor the authentication module and add tests"
Auto-approves each operation sequentially.
Scenario 4: Longer Wait Time
For slower systems or models:
IDLE_THRESHOLD="5.0" RESPONSE_DELAY="2.0" claude-vision
Scenario 5: Different Vision Model
VISION_MODEL="llama3.2-vision:latest" claude-vision
Scenario 6: Remote Ollama
OLLAMA_URL="http://192.168.1.100:11434/api/generate" claude-vision
Scenario 7: Debug Mode
When troubleshooting:
DEBUG=true claude-vision "test command"
Output will include:
[DEBUG] Screenshot saved to /home/user/.cache/claude-vision-auto/screenshot_1234567890.png
[DEBUG] Sending to Ollama: http://localhost:11434/api/generate
[DEBUG] Model: minicpm-v:latest
[DEBUG] Vision model response: 1
Advanced Usage
Using Different Models
MiniCPM-V (Default)
Best for structured responses:
VISION_MODEL="minicpm-v:latest" claude-vision
Llama 3.2 Vision
Good alternative:
VISION_MODEL="llama3.2-vision:latest" claude-vision
LLaVA
Lightweight option:
VISION_MODEL="llava:latest" claude-vision
Shell Aliases
Add to ~/.bashrc:
# Quick aliases
alias cv="claude-vision"
alias cvd="DEBUG=true claude-vision" # Debug mode
alias cvs="IDLE_THRESHOLD=5.0 claude-vision" # Slower
# Model-specific
alias cv-mini="VISION_MODEL=minicpm-v:latest claude-vision"
alias cv-llama="VISION_MODEL=llama3.2-vision:latest claude-vision"
Integration with Scripts
#!/bin/bash
# auto-refactor.sh
export DEBUG="false"
export IDLE_THRESHOLD="3.0"
claude-vision "refactor all JavaScript files to use modern ES6 syntax"
Conditional Auto-Approval
Create wrapper script:
#!/bin/bash
# conditional-claude.sh
if [ "$AUTO_APPROVE" = "true" ]; then
claude-vision "$@"
else
claude "$@"
fi
Usage:
AUTO_APPROVE=true ./conditional-claude.sh "create files"
Multiple Terminal Support
Each terminal needs separate instance:
# Terminal 1
claude-vision # Project A
# Terminal 2
claude-vision # Project B
Tips and Best Practices
1. Adjust Idle Threshold
- Fast system:
IDLE_THRESHOLD=2.0 - Slow system:
IDLE_THRESHOLD=5.0 - Remote Ollama:
IDLE_THRESHOLD=4.0
2. Model Selection
- Accuracy: MiniCPM-V > Llama 3.2 Vision > LLaVA
- Speed: LLaVA > MiniCPM-V > Llama 3.2 Vision
- Size: LLaVA (4.5GB) < MiniCPM-V (5.5GB) < Llama 3.2 (7.8GB)
3. Debug When Needed
Enable debug mode if responses are incorrect:
DEBUG=true claude-vision
Check vision model output to see what it's detecting.
4. Screenshot Quality
Ensure terminal is visible and not obscured by other windows.
5. Performance Optimization
For faster responses:
# Pre-warm the model
docker exec ollama ollama run minicpm-v:latest "test" < /dev/null
# Then use normally
claude-vision
6. Clean Up Old Screenshots
Screenshots are auto-cleaned after 1 hour, but manual cleanup:
rm -rf ~/.cache/claude-vision-auto/*.png
7. Check Model Status
Before starting long sessions:
# Verify Ollama is responsive
curl -s http://localhost:11434/api/tags | python3 -m json.tool
# Check model is loaded
docker exec ollama ollama ps
Troubleshooting During Use
Issue: Not Auto-Responding
Symptoms: Vision analyzes but doesn't respond
Solutions:
-
Check if "WAIT" is returned:
DEBUG=true claude-vision # Look for: [DEBUG] Vision model response: WAIT -
Increase idle threshold:
IDLE_THRESHOLD="5.0" claude-vision -
Try different model:
VISION_MODEL="llama3.2-vision:latest" claude-vision
Issue: Wrong Response
Symptoms: Selects wrong option or types wrong answer
Solutions:
-
Enable debug to see what vision model sees:
DEBUG=true claude-vision -
Check screenshot manually:
# Don't auto-clean ls ~/.cache/claude-vision-auto/ -
Adjust response delay:
RESPONSE_DELAY="2.0" claude-vision
Issue: Slow Response
Symptoms: Takes too long to respond
Solutions:
-
Use faster model:
VISION_MODEL="llava:latest" claude-vision -
Reduce vision timeout:
VISION_TIMEOUT="15" claude-vision -
Check Ollama performance:
docker stats ollama
Issue: Too Aggressive
Symptoms: Responds to non-approval prompts
Solutions:
-
Increase idle threshold:
IDLE_THRESHOLD="5.0" claude-vision -
Check approval keywords in config
-
Use manual mode for sensitive operations:
claude # No auto-approval
Getting Help
If issues persist:
- Check logs with
DEBUG=true - Review documentation
- Report issues with debug output
- Include screenshot samples
Next Steps
- Explore examples/ directory
- Customize configuration for your workflow
- Create shell aliases for common tasks
- Integrate with CI/CD pipelines (if applicable)