Vision-module-auto/docs/USAGE.md

# Usage Guide

Comprehensive usage guide for Claude Vision Auto.

## Table of Contents

1. [Basic Usage](#basic-usage)
2. [Configuration](#configuration)
3. [Common Scenarios](#common-scenarios)
4. [Advanced Usage](#advanced-usage)
5. [Tips and Best Practices](#tips-and-best-practices)

## Basic Usage

### Starting Interactive Session

Simply replace `claude` with `claude-vision`:

```bash
claude-vision
```

Expected output:

```
[Claude Vision Auto] Testing Ollama connection...
[Claude Vision Auto] Connected to Ollama
[Claude Vision Auto] Using model: minicpm-v:latest
[Claude Vision Auto] Idle threshold: 3.0s

 ▐▛███▜▌   Claude Code v2.0.26
▝▜█████▛▘  Sonnet 4.5 · Claude Max
  ▘▘ ▝▝    /home/username/project

>
```

### With Initial Prompt

```bash
claude-vision "create a test.md file in /tmp"
```

### How Auto-Approval Works

When Claude asks for approval:

```
╭───────────────────────────────────────────╮
│ Create file                               │
│ ╭───────────────────────────────────────╮ │
│ │ /tmp/test.md                          │ │
│ │                                       │ │
│ │ # Test File                           │ │
│ │                                       │ │
│ │ This is a test.                       │ │
│ ╰───────────────────────────────────────╯ │
│ Do you want to create test.md?            │
│ ❯ 1. Yes                                  │
│   2. Yes, allow all edits                 │
│   3. No                                   │
╰───────────────────────────────────────────╯
```

Claude Vision Auto will:
1. Detect idle state (3 seconds)
2. Take screenshot
3. Analyze with vision model
4. Automatically select option 1

Output:

```
[Vision] Analyzing prompt...
[Vision] Response: 1
[Vision] Response sent
```

## Configuration

### Environment Variables

Set before running `claude-vision`:

```bash
# Example: Custom configuration
export OLLAMA_URL="http://192.168.1.100:11434/api/generate"
export VISION_MODEL="llama3.2-vision:latest"
export IDLE_THRESHOLD="5.0"
export RESPONSE_DELAY="2.0"
export DEBUG="true"

claude-vision
```

### Persistent Configuration

Add to `~/.bashrc` or `~/.zshrc`:

```bash
# Claude Vision Auto Configuration
export CLAUDE_VISION_OLLAMA_URL="http://localhost:11434/api/generate"
export CLAUDE_VISION_MODEL="minicpm-v:latest"
export CLAUDE_VISION_IDLE_THRESHOLD="3.0"
export CLAUDE_VISION_RESPONSE_DELAY="1.0"

# Alias for convenience
alias cv="claude-vision"
```

Reload:

```bash
source ~/.bashrc
```

### Configuration Options Reference

| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| `OLLAMA_URL` | URL | `http://localhost:11434/api/generate` | Ollama API endpoint |
| `VISION_MODEL` | String | `minicpm-v:latest` | Vision model name |
| `IDLE_THRESHOLD` | Float | `3.0` | Seconds to wait before screenshot |
| `RESPONSE_DELAY` | Float | `1.0` | Seconds to wait before responding |
| `OUTPUT_BUFFER_SIZE` | Integer | `4096` | Buffer size in bytes |
| `SCREENSHOT_TIMEOUT` | Integer | `5` | Screenshot timeout (seconds) |
| `VISION_TIMEOUT` | Integer | `30` | Vision analysis timeout (seconds) |
| `DEBUG` | Boolean | `false` | Enable debug logging |

## Common Scenarios

### Scenario 1: File Creation

```bash
claude-vision "create a new Python script called hello.py"
```

Auto-approves file creation.

### Scenario 2: File Editing

```bash
claude-vision "add error handling to main.py"
```

Auto-approves file edits.

### Scenario 3: Multiple Operations

```bash
claude-vision "refactor the authentication module and add tests"
```

Auto-approves each operation sequentially.

### Scenario 4: Longer Wait Time

For slower systems or models:

```bash
IDLE_THRESHOLD="5.0" RESPONSE_DELAY="2.0" claude-vision
```

### Scenario 5: Different Vision Model

```bash
VISION_MODEL="llama3.2-vision:latest" claude-vision
```

### Scenario 6: Remote Ollama

```bash
OLLAMA_URL="http://192.168.1.100:11434/api/generate" claude-vision
```

### Scenario 7: Debug Mode

When troubleshooting:

```bash
DEBUG=true claude-vision "test command"
```

Output will include:

```
[DEBUG] Screenshot saved to /home/user/.cache/claude-vision-auto/screenshot_1234567890.png
[DEBUG] Sending to Ollama: http://localhost:11434/api/generate
[DEBUG] Model: minicpm-v:latest
[DEBUG] Vision model response: 1
```

## Advanced Usage

### Using Different Models

#### MiniCPM-V (Default)

Best for structured responses:

```bash
VISION_MODEL="minicpm-v:latest" claude-vision
```

#### Llama 3.2 Vision

Good alternative:

```bash
VISION_MODEL="llama3.2-vision:latest" claude-vision
```

#### LLaVA

Lightweight option:

```bash
VISION_MODEL="llava:latest" claude-vision
```

### Shell Aliases

Add to `~/.bashrc`:

```bash
# Quick aliases
alias cv="claude-vision"
alias cvd="DEBUG=true claude-vision"  # Debug mode
alias cvs="IDLE_THRESHOLD=5.0 claude-vision"  # Slower

# Model-specific
alias cv-mini="VISION_MODEL=minicpm-v:latest claude-vision"
alias cv-llama="VISION_MODEL=llama3.2-vision:latest claude-vision"
```

### Integration with Scripts

```bash
#!/bin/bash
# auto-refactor.sh

export DEBUG="false"
export IDLE_THRESHOLD="3.0"

claude-vision "refactor all JavaScript files to use modern ES6 syntax"
```

### Conditional Auto-Approval

Create wrapper script:

```bash
#!/bin/bash
# conditional-claude.sh

if [ "$AUTO_APPROVE" = "true" ]; then
    claude-vision "$@"
else
    claude "$@"
fi
```

Usage:

```bash
AUTO_APPROVE=true ./conditional-claude.sh "create files"
```

### Multiple Terminal Support

Each terminal needs separate instance:

```bash
# Terminal 1
claude-vision  # Project A

# Terminal 2
claude-vision  # Project B
```

## Tips and Best Practices

### 1. Adjust Idle Threshold

- **Fast system**: `IDLE_THRESHOLD=2.0`
- **Slow system**: `IDLE_THRESHOLD=5.0`
- **Remote Ollama**: `IDLE_THRESHOLD=4.0`

### 2. Model Selection

- **Accuracy**: MiniCPM-V > Llama 3.2 Vision > LLaVA
- **Speed**: LLaVA > MiniCPM-V > Llama 3.2 Vision
- **Size**: LLaVA (4.5GB) < MiniCPM-V (5.5GB) < Llama 3.2 (7.8GB)

### 3. Debug When Needed

Enable debug mode if responses are incorrect:

```bash
DEBUG=true claude-vision
```

Check vision model output to see what it's detecting.

### 4. Screenshot Quality

Ensure terminal is visible and not obscured by other windows.

### 5. Performance Optimization

For faster responses:

```bash
# Pre-warm the model
docker exec ollama ollama run minicpm-v:latest "test" < /dev/null

# Then use normally
claude-vision
```

### 6. Clean Up Old Screenshots

Screenshots are auto-cleaned after 1 hour, but manual cleanup:

```bash
rm -rf ~/.cache/claude-vision-auto/*.png
```

### 7. Check Model Status

Before starting long sessions:

```bash
# Verify Ollama is responsive
curl -s http://localhost:11434/api/tags | python3 -m json.tool

# Check model is loaded
docker exec ollama ollama ps
```

## Troubleshooting During Use

### Issue: Not Auto-Responding

**Symptoms**: Vision analyzes but doesn't respond

**Solutions**:

1. Check if "WAIT" is returned:
   ```bash
   DEBUG=true claude-vision
   # Look for: [DEBUG] Vision model response: WAIT
   ```

2. Increase idle threshold:
   ```bash
   IDLE_THRESHOLD="5.0" claude-vision
   ```

3. Try different model:
   ```bash
   VISION_MODEL="llama3.2-vision:latest" claude-vision
   ```

### Issue: Wrong Response

**Symptoms**: Selects wrong option or types wrong answer

**Solutions**:

1. Enable debug to see what vision model sees:
   ```bash
   DEBUG=true claude-vision
   ```

2. Check screenshot manually:
   ```bash
   # Don't auto-clean
   ls ~/.cache/claude-vision-auto/
   ```

3. Adjust response delay:
   ```bash
   RESPONSE_DELAY="2.0" claude-vision
   ```

### Issue: Slow Response

**Symptoms**: Takes too long to respond

**Solutions**:

1. Use faster model:
   ```bash
   VISION_MODEL="llava:latest" claude-vision
   ```

2. Reduce vision timeout:
   ```bash
   VISION_TIMEOUT="15" claude-vision
   ```

3. Check Ollama performance:
   ```bash
   docker stats ollama
   ```

### Issue: Too Aggressive

**Symptoms**: Responds to non-approval prompts

**Solutions**:

1. Increase idle threshold:
   ```bash
   IDLE_THRESHOLD="5.0" claude-vision
   ```

2. Check approval keywords in config
3. Use manual mode for sensitive operations:
   ```bash
   claude  # No auto-approval
   ```

## Getting Help

If issues persist:

1. Check logs with `DEBUG=true`
2. Review documentation
3. Report issues with debug output
4. Include screenshot samples

## Next Steps

- Explore [examples/](../examples/) directory
- Customize configuration for your workflow
- Create shell aliases for common tasks
- Integrate with CI/CD pipelines (if applicable)