Vision-module-auto/docs/INSTALLATION.md
Svrnty 41cecca0e2 Initial release of Claude Vision Auto v1.0.0
Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model.

Features:
- Automatic detection and response to approval prompts
- Screenshot capture and vision analysis via Ollama
- Support for multiple screenshot tools (scrot, gnome-screenshot, etc.)
- Configurable timing and behavior
- Debug mode for troubleshooting
- Comprehensive documentation

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>
2025-10-29 10:09:01 -04:00

5.3 KiB

Installation Guide

Detailed installation instructions for Claude Vision Auto.

Table of Contents

  1. Prerequisites
  2. System Dependencies
  3. Ollama Setup
  4. Package Installation
  5. Verification
  6. Troubleshooting

Prerequisites

1. Claude Code CLI

Install Anthropic's official CLI:

npm install -g @anthropic-ai/claude-code

Verify installation:

claude --version

2. Python 3.8+

Check Python version:

python3 --version

If not installed:

sudo apt-get update
sudo apt-get install python3 python3-pip

3. Docker (for Ollama)

Install Docker:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

Log out and back in for group changes to take effect.

System Dependencies

Screenshot Tool

Install scrot (recommended):

sudo apt-get update
sudo apt-get install -y scrot

Alternative screenshot tools:

# GNOME Screenshot
sudo apt-get install -y gnome-screenshot

# ImageMagick
sudo apt-get install -y imagemagick

# Maim
sudo apt-get install -y maim xdotool

Additional Dependencies

sudo apt-get install -y \
    python3-pip \
    git \
    curl

Ollama Setup

1. Pull Ollama Docker Image

docker pull ollama/ollama:latest

2. Start Ollama Container

docker run -d \
    -p 11434:11434 \
    --name ollama \
    --restart unless-stopped \
    ollama/ollama:latest

For GPU support (NVIDIA):

docker run -d \
    -p 11434:11434 \
    --name ollama \
    --gpus all \
    --restart unless-stopped \
    ollama/ollama:latest

3. Pull Vision Model

# MiniCPM-V (recommended - 5.5GB)
docker exec ollama ollama pull minicpm-v:latest

# Alternative: Llama 3.2 Vision (7.8GB)
docker exec ollama ollama pull llama3.2-vision:latest

# Alternative: LLaVA (4.5GB)
docker exec ollama ollama pull llava:latest

4. Verify Ollama

# Check container status
docker ps | grep ollama

# Test API
curl http://localhost:11434/api/tags

# List installed models
curl -s http://localhost:11434/api/tags | python3 -m json.tool

Package Installation

cd claude-vision-auto

# Install system dependencies
make deps

# Install package
make install

Method 2: Manual Installation

cd claude-vision-auto

# Install system dependencies
sudo apt-get update
sudo apt-get install -y scrot python3-pip

# Install Python package
pip3 install -e .

Method 3: From Git

# Clone repository
git clone https://git.openharbor.io/svrnty/claude-vision-auto.git
cd claude-vision-auto

# Install
pip3 install -e .

Verification

1. Check Command Installation

which claude-vision

Expected output: /home/username/.local/bin/claude-vision

2. Test Ollama Connection

curl http://localhost:11434/api/tags

Should return JSON with list of models.

3. Test Screenshot

scrot /tmp/test_screenshot.png
ls -lh /tmp/test_screenshot.png

Should create a screenshot file.

4. Run Test

# Start claude-vision
claude-vision

# You should see:
# [Claude Vision Auto] Testing Ollama connection...
# [Claude Vision Auto] Connected to Ollama
# [Claude Vision Auto] Using model: minicpm-v:latest

Troubleshooting

"claude-vision: command not found"

Add to PATH in ~/.bashrc or ~/.zshrc:

export PATH="$HOME/.local/bin:$PATH"

Then reload:

source ~/.bashrc  # or source ~/.zshrc

"Cannot connect to Ollama"

Check if Ollama container is running:

docker ps | grep ollama

# If not running, start it:
docker start ollama

Check if port 11434 is open:

netstat -tulpn | grep 11434
# or
ss -tulpn | grep 11434

"Model not found"

Pull the model:

docker exec ollama ollama pull minicpm-v:latest

List available models:

docker exec ollama ollama list

"Screenshot failed"

Install scrot:

sudo apt-get install scrot

Test screenshot:

scrot -u /tmp/test.png

If error persists, try alternative tools in config:

export SCREENSHOT_TOOL="gnome-screenshot"
claude-vision

Permission Issues

If pip install fails with permissions:

# Install for user only
pip3 install --user -e .

# Or use virtual environment
python3 -m venv venv
source venv/bin/activate
pip install -e .

Docker Permission Denied

Add user to docker group:

sudo usermod -aG docker $USER

Log out and back in, then:

docker ps  # Should work without sudo

Uninstallation

Remove Package

make uninstall
# or
pip3 uninstall claude-vision-auto

Remove Ollama

docker stop ollama
docker rm ollama
docker rmi ollama/ollama

Remove System Dependencies

sudo apt-get remove scrot

Next Steps

After successful installation:

  1. Read USAGE.md for usage examples
  2. Configure environment variables if needed
  3. Test with a simple Claude Code command

Getting Help

If you encounter issues not covered here:

  1. Check the main README.md
  2. Enable debug mode: DEBUG=true claude-vision
  3. Check logs: ~/.cache/claude-vision-auto/
  4. Report issues: https://git.openharbor.io/svrnty/claude-vision-auto/issues