Svrnty 41cecca0e2 Initial release of Claude Vision Auto v1.0.0

Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model.

Features:
- Automatic detection and response to approval prompts
- Screenshot capture and vision analysis via Ollama
- Support for multiple screenshot tools (scrot, gnome-screenshot, etc.)
- Configurable timing and behavior
- Debug mode for troubleshooting
- Comprehensive documentation

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>

2025-10-29 10:09:01 -04:00

8.9 KiB

Raw Blame History

Usage Guide

Comprehensive usage guide for Claude Vision Auto.

Basic Usage
Configuration
Common Scenarios
Advanced Usage
Tips and Best Practices

Basic Usage

Starting Interactive Session

Simply replace claude with claude-vision:

claude-vision

Expected output:

[Claude Vision Auto] Testing Ollama connection...
[Claude Vision Auto] Connected to Ollama
[Claude Vision Auto] Using model: minicpm-v:latest
[Claude Vision Auto] Idle threshold: 3.0s

 ▐▛███▜▌   Claude Code v2.0.26
▝▜█████▛▘  Sonnet 4.5 · Claude Max
  ▘▘ ▝▝    /home/username/project

>

With Initial Prompt

claude-vision "create a test.md file in /tmp"

How Auto-Approval Works

When Claude asks for approval:

╭───────────────────────────────────────────╮
│ Create file                               │
│ ╭───────────────────────────────────────╮ │
│ │ /tmp/test.md                          │ │
│ │                                       │ │
│ │ # Test File                           │ │
│ │                                       │ │
│ │ This is a test.                       │ │
│ ╰───────────────────────────────────────╯ │
│ Do you want to create test.md?            │
│ ❯ 1. Yes                                  │
│   2. Yes, allow all edits                 │
│   3. No                                   │
╰───────────────────────────────────────────╯

Claude Vision Auto will:

Detect idle state (3 seconds)
Take screenshot
Analyze with vision model
Automatically select option 1

Output:

[Vision] Analyzing prompt...
[Vision] Response: 1
[Vision] Response sent

Configuration

Environment Variables

Set before running claude-vision:

# Example: Custom configuration
export OLLAMA_URL="http://192.168.1.100:11434/api/generate"
export VISION_MODEL="llama3.2-vision:latest"
export IDLE_THRESHOLD="5.0"
export RESPONSE_DELAY="2.0"
export DEBUG="true"

claude-vision

Persistent Configuration

Add to ~/.bashrc or ~/.zshrc:

# Claude Vision Auto Configuration
export CLAUDE_VISION_OLLAMA_URL="http://localhost:11434/api/generate"
export CLAUDE_VISION_MODEL="minicpm-v:latest"
export CLAUDE_VISION_IDLE_THRESHOLD="3.0"
export CLAUDE_VISION_RESPONSE_DELAY="1.0"

# Alias for convenience
alias cv="claude-vision"

Reload:

source ~/.bashrc

Configuration Options Reference

Variable	Type	Default	Description
`OLLAMA_URL`	URL	`http://localhost:11434/api/generate`	Ollama API endpoint
`VISION_MODEL`	String	`minicpm-v:latest`	Vision model name
`IDLE_THRESHOLD`	Float	`3.0`	Seconds to wait before screenshot
`RESPONSE_DELAY`	Float	`1.0`	Seconds to wait before responding
`OUTPUT_BUFFER_SIZE`	Integer	`4096`	Buffer size in bytes
`SCREENSHOT_TIMEOUT`	Integer	`5`	Screenshot timeout (seconds)
`VISION_TIMEOUT`	Integer	`30`	Vision analysis timeout (seconds)
`DEBUG`	Boolean	`false`	Enable debug logging

Common Scenarios

Scenario 1: File Creation

claude-vision "create a new Python script called hello.py"

Auto-approves file creation.

Scenario 2: File Editing

claude-vision "add error handling to main.py"

Auto-approves file edits.

Scenario 3: Multiple Operations

claude-vision "refactor the authentication module and add tests"

Auto-approves each operation sequentially.

Scenario 4: Longer Wait Time

For slower systems or models:

IDLE_THRESHOLD="5.0" RESPONSE_DELAY="2.0" claude-vision

Scenario 5: Different Vision Model

VISION_MODEL="llama3.2-vision:latest" claude-vision

Scenario 6: Remote Ollama

OLLAMA_URL="http://192.168.1.100:11434/api/generate" claude-vision

Scenario 7: Debug Mode

When troubleshooting:

DEBUG=true claude-vision "test command"

Output will include:

[DEBUG] Screenshot saved to /home/user/.cache/claude-vision-auto/screenshot_1234567890.png
[DEBUG] Sending to Ollama: http://localhost:11434/api/generate
[DEBUG] Model: minicpm-v:latest
[DEBUG] Vision model response: 1

Advanced Usage

Using Different Models

MiniCPM-V (Default)

Best for structured responses:

VISION_MODEL="minicpm-v:latest" claude-vision

Llama 3.2 Vision

Good alternative:

VISION_MODEL="llama3.2-vision:latest" claude-vision

LLaVA

Lightweight option:

VISION_MODEL="llava:latest" claude-vision

Shell Aliases

Add to ~/.bashrc:

# Quick aliases
alias cv="claude-vision"
alias cvd="DEBUG=true claude-vision"  # Debug mode
alias cvs="IDLE_THRESHOLD=5.0 claude-vision"  # Slower

# Model-specific
alias cv-mini="VISION_MODEL=minicpm-v:latest claude-vision"
alias cv-llama="VISION_MODEL=llama3.2-vision:latest claude-vision"

Integration with Scripts

#!/bin/bash
# auto-refactor.sh

export DEBUG="false"
export IDLE_THRESHOLD="3.0"

claude-vision "refactor all JavaScript files to use modern ES6 syntax"

Conditional Auto-Approval

Create wrapper script:

#!/bin/bash
# conditional-claude.sh

if [ "$AUTO_APPROVE" = "true" ]; then
    claude-vision "$@"
else
    claude "$@"
fi

Usage:

AUTO_APPROVE=true ./conditional-claude.sh "create files"

Multiple Terminal Support

Each terminal needs separate instance:

# Terminal 1
claude-vision  # Project A

# Terminal 2
claude-vision  # Project B

Tips and Best Practices

1. Adjust Idle Threshold

Fast system: IDLE_THRESHOLD=2.0
Slow system: IDLE_THRESHOLD=5.0
Remote Ollama: IDLE_THRESHOLD=4.0

2. Model Selection

Accuracy: MiniCPM-V > Llama 3.2 Vision > LLaVA
Speed: LLaVA > MiniCPM-V > Llama 3.2 Vision
Size: LLaVA (4.5GB) < MiniCPM-V (5.5GB) < Llama 3.2 (7.8GB)

3. Debug When Needed

Enable debug mode if responses are incorrect:

DEBUG=true claude-vision

Check vision model output to see what it's detecting.

4. Screenshot Quality

Ensure terminal is visible and not obscured by other windows.

5. Performance Optimization

For faster responses:

# Pre-warm the model
docker exec ollama ollama run minicpm-v:latest "test" < /dev/null

# Then use normally
claude-vision

6. Clean Up Old Screenshots

Screenshots are auto-cleaned after 1 hour, but manual cleanup:

rm -rf ~/.cache/claude-vision-auto/*.png

7. Check Model Status

Before starting long sessions:

# Verify Ollama is responsive
curl -s http://localhost:11434/api/tags | python3 -m json.tool

# Check model is loaded
docker exec ollama ollama ps

Troubleshooting During Use

Issue: Not Auto-Responding

Symptoms: Vision analyzes but doesn't respond

Solutions:

Check if "WAIT" is returned:

DEBUG=true claude-vision
# Look for: [DEBUG] Vision model response: WAIT

Increase idle threshold:
```
IDLE_THRESHOLD="5.0" claude-vision
```

Try different model:

VISION_MODEL="llama3.2-vision:latest" claude-vision

Issue: Wrong Response

Symptoms: Selects wrong option or types wrong answer

Solutions:

Enable debug to see what vision model sees:
```
DEBUG=true claude-vision
```

Check screenshot manually:

# Don't auto-clean
ls ~/.cache/claude-vision-auto/

Adjust response delay:
```
RESPONSE_DELAY="2.0" claude-vision
```

Issue: Slow Response

Symptoms: Takes too long to respond

Solutions:

Use faster model:

VISION_MODEL="llava:latest" claude-vision

Reduce vision timeout:
```
VISION_TIMEOUT="15" claude-vision
```
Check Ollama performance:
```
docker stats ollama
```

Issue: Too Aggressive

Symptoms: Responds to non-approval prompts

Solutions:

Increase idle threshold:
```
IDLE_THRESHOLD="5.0" claude-vision
```
Check approval keywords in config
Use manual mode for sensitive operations:
```
claude  # No auto-approval
```

Getting Help

If issues persist:

Check logs with DEBUG=true
Review documentation
Report issues with debug output
Include screenshot samples

Next Steps

Explore examples/ directory
Customize configuration for your workflow
Create shell aliases for common tasks
Integrate with CI/CD pipelines (if applicable)

8.9 KiB Raw Blame History Unescape Escape

Usage Guide

Table of Contents

Basic Usage

Starting Interactive Session

With Initial Prompt

How Auto-Approval Works

Configuration

Environment Variables

Persistent Configuration

Configuration Options Reference

Common Scenarios

Scenario 1: File Creation

Scenario 2: File Editing

Scenario 3: Multiple Operations

Scenario 4: Longer Wait Time

Scenario 5: Different Vision Model

Scenario 6: Remote Ollama

Scenario 7: Debug Mode

Advanced Usage

Using Different Models

MiniCPM-V (Default)

Llama 3.2 Vision

LLaVA

Shell Aliases

Integration with Scripts

Conditional Auto-Approval

Multiple Terminal Support

Tips and Best Practices

1. Adjust Idle Threshold

2. Model Selection

3. Debug When Needed

4. Screenshot Quality

5. Performance Optimization

6. Clean Up Old Screenshots

7. Check Model Status

Troubleshooting During Use

Issue: Not Auto-Responding

Issue: Wrong Response

Issue: Slow Response

Issue: Too Aggressive

Getting Help

Next Steps

8.9 KiB

Raw Blame History