Vision-module-auto/docs/USAGE.md
Svrnty 41cecca0e2 Initial release of Claude Vision Auto v1.0.0
Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model.

Features:
- Automatic detection and response to approval prompts
- Screenshot capture and vision analysis via Ollama
- Support for multiple screenshot tools (scrot, gnome-screenshot, etc.)
- Configurable timing and behavior
- Debug mode for troubleshooting
- Comprehensive documentation

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>
2025-10-29 10:09:01 -04:00

8.9 KiB
Raw Blame History

Usage Guide

Comprehensive usage guide for Claude Vision Auto.

Table of Contents

  1. Basic Usage
  2. Configuration
  3. Common Scenarios
  4. Advanced Usage
  5. Tips and Best Practices

Basic Usage

Starting Interactive Session

Simply replace claude with claude-vision:

claude-vision

Expected output:

[Claude Vision Auto] Testing Ollama connection...
[Claude Vision Auto] Connected to Ollama
[Claude Vision Auto] Using model: minicpm-v:latest
[Claude Vision Auto] Idle threshold: 3.0s

 ▐▛███▜▌   Claude Code v2.0.26
▝▜█████▛▘  Sonnet 4.5 · Claude Max
  ▘▘ ▝▝    /home/username/project

>

With Initial Prompt

claude-vision "create a test.md file in /tmp"

How Auto-Approval Works

When Claude asks for approval:

╭───────────────────────────────────────────╮
│ Create file                               │
│ ╭───────────────────────────────────────╮ │
│ │ /tmp/test.md                          │ │
│ │                                       │ │
│ │ # Test File                           │ │
│ │                                       │ │
│ │ This is a test.                       │ │
│ ╰───────────────────────────────────────╯ │
│ Do you want to create test.md?            │
│  1. Yes                                  │
│   2. Yes, allow all edits                 │
│   3. No                                   │
╰───────────────────────────────────────────╯

Claude Vision Auto will:

  1. Detect idle state (3 seconds)
  2. Take screenshot
  3. Analyze with vision model
  4. Automatically select option 1

Output:

[Vision] Analyzing prompt...
[Vision] Response: 1
[Vision] Response sent

Configuration

Environment Variables

Set before running claude-vision:

# Example: Custom configuration
export OLLAMA_URL="http://192.168.1.100:11434/api/generate"
export VISION_MODEL="llama3.2-vision:latest"
export IDLE_THRESHOLD="5.0"
export RESPONSE_DELAY="2.0"
export DEBUG="true"

claude-vision

Persistent Configuration

Add to ~/.bashrc or ~/.zshrc:

# Claude Vision Auto Configuration
export CLAUDE_VISION_OLLAMA_URL="http://localhost:11434/api/generate"
export CLAUDE_VISION_MODEL="minicpm-v:latest"
export CLAUDE_VISION_IDLE_THRESHOLD="3.0"
export CLAUDE_VISION_RESPONSE_DELAY="1.0"

# Alias for convenience
alias cv="claude-vision"

Reload:

source ~/.bashrc

Configuration Options Reference

Variable Type Default Description
OLLAMA_URL URL http://localhost:11434/api/generate Ollama API endpoint
VISION_MODEL String minicpm-v:latest Vision model name
IDLE_THRESHOLD Float 3.0 Seconds to wait before screenshot
RESPONSE_DELAY Float 1.0 Seconds to wait before responding
OUTPUT_BUFFER_SIZE Integer 4096 Buffer size in bytes
SCREENSHOT_TIMEOUT Integer 5 Screenshot timeout (seconds)
VISION_TIMEOUT Integer 30 Vision analysis timeout (seconds)
DEBUG Boolean false Enable debug logging

Common Scenarios

Scenario 1: File Creation

claude-vision "create a new Python script called hello.py"

Auto-approves file creation.

Scenario 2: File Editing

claude-vision "add error handling to main.py"

Auto-approves file edits.

Scenario 3: Multiple Operations

claude-vision "refactor the authentication module and add tests"

Auto-approves each operation sequentially.

Scenario 4: Longer Wait Time

For slower systems or models:

IDLE_THRESHOLD="5.0" RESPONSE_DELAY="2.0" claude-vision

Scenario 5: Different Vision Model

VISION_MODEL="llama3.2-vision:latest" claude-vision

Scenario 6: Remote Ollama

OLLAMA_URL="http://192.168.1.100:11434/api/generate" claude-vision

Scenario 7: Debug Mode

When troubleshooting:

DEBUG=true claude-vision "test command"

Output will include:

[DEBUG] Screenshot saved to /home/user/.cache/claude-vision-auto/screenshot_1234567890.png
[DEBUG] Sending to Ollama: http://localhost:11434/api/generate
[DEBUG] Model: minicpm-v:latest
[DEBUG] Vision model response: 1

Advanced Usage

Using Different Models

MiniCPM-V (Default)

Best for structured responses:

VISION_MODEL="minicpm-v:latest" claude-vision

Llama 3.2 Vision

Good alternative:

VISION_MODEL="llama3.2-vision:latest" claude-vision

LLaVA

Lightweight option:

VISION_MODEL="llava:latest" claude-vision

Shell Aliases

Add to ~/.bashrc:

# Quick aliases
alias cv="claude-vision"
alias cvd="DEBUG=true claude-vision"  # Debug mode
alias cvs="IDLE_THRESHOLD=5.0 claude-vision"  # Slower

# Model-specific
alias cv-mini="VISION_MODEL=minicpm-v:latest claude-vision"
alias cv-llama="VISION_MODEL=llama3.2-vision:latest claude-vision"

Integration with Scripts

#!/bin/bash
# auto-refactor.sh

export DEBUG="false"
export IDLE_THRESHOLD="3.0"

claude-vision "refactor all JavaScript files to use modern ES6 syntax"

Conditional Auto-Approval

Create wrapper script:

#!/bin/bash
# conditional-claude.sh

if [ "$AUTO_APPROVE" = "true" ]; then
    claude-vision "$@"
else
    claude "$@"
fi

Usage:

AUTO_APPROVE=true ./conditional-claude.sh "create files"

Multiple Terminal Support

Each terminal needs separate instance:

# Terminal 1
claude-vision  # Project A

# Terminal 2
claude-vision  # Project B

Tips and Best Practices

1. Adjust Idle Threshold

  • Fast system: IDLE_THRESHOLD=2.0
  • Slow system: IDLE_THRESHOLD=5.0
  • Remote Ollama: IDLE_THRESHOLD=4.0

2. Model Selection

  • Accuracy: MiniCPM-V > Llama 3.2 Vision > LLaVA
  • Speed: LLaVA > MiniCPM-V > Llama 3.2 Vision
  • Size: LLaVA (4.5GB) < MiniCPM-V (5.5GB) < Llama 3.2 (7.8GB)

3. Debug When Needed

Enable debug mode if responses are incorrect:

DEBUG=true claude-vision

Check vision model output to see what it's detecting.

4. Screenshot Quality

Ensure terminal is visible and not obscured by other windows.

5. Performance Optimization

For faster responses:

# Pre-warm the model
docker exec ollama ollama run minicpm-v:latest "test" < /dev/null

# Then use normally
claude-vision

6. Clean Up Old Screenshots

Screenshots are auto-cleaned after 1 hour, but manual cleanup:

rm -rf ~/.cache/claude-vision-auto/*.png

7. Check Model Status

Before starting long sessions:

# Verify Ollama is responsive
curl -s http://localhost:11434/api/tags | python3 -m json.tool

# Check model is loaded
docker exec ollama ollama ps

Troubleshooting During Use

Issue: Not Auto-Responding

Symptoms: Vision analyzes but doesn't respond

Solutions:

  1. Check if "WAIT" is returned:

    DEBUG=true claude-vision
    # Look for: [DEBUG] Vision model response: WAIT
    
  2. Increase idle threshold:

    IDLE_THRESHOLD="5.0" claude-vision
    
  3. Try different model:

    VISION_MODEL="llama3.2-vision:latest" claude-vision
    

Issue: Wrong Response

Symptoms: Selects wrong option or types wrong answer

Solutions:

  1. Enable debug to see what vision model sees:

    DEBUG=true claude-vision
    
  2. Check screenshot manually:

    # Don't auto-clean
    ls ~/.cache/claude-vision-auto/
    
  3. Adjust response delay:

    RESPONSE_DELAY="2.0" claude-vision
    

Issue: Slow Response

Symptoms: Takes too long to respond

Solutions:

  1. Use faster model:

    VISION_MODEL="llava:latest" claude-vision
    
  2. Reduce vision timeout:

    VISION_TIMEOUT="15" claude-vision
    
  3. Check Ollama performance:

    docker stats ollama
    

Issue: Too Aggressive

Symptoms: Responds to non-approval prompts

Solutions:

  1. Increase idle threshold:

    IDLE_THRESHOLD="5.0" claude-vision
    
  2. Check approval keywords in config

  3. Use manual mode for sensitive operations:

    claude  # No auto-approval
    

Getting Help

If issues persist:

  1. Check logs with DEBUG=true
  2. Review documentation
  3. Report issues with debug output
  4. Include screenshot samples

Next Steps

  • Explore examples/ directory
  • Customize configuration for your workflow
  • Create shell aliases for common tasks
  • Integrate with CI/CD pipelines (if applicable)