Vision-module-auto/docs/INSTALLATION.md
Svrnty 41cecca0e2 Initial release of Claude Vision Auto v1.0.0
Vision-based auto-approval system for Claude Code CLI using MiniCPM-V vision model.

Features:
- Automatic detection and response to approval prompts
- Screenshot capture and vision analysis via Ollama
- Support for multiple screenshot tools (scrot, gnome-screenshot, etc.)
- Configurable timing and behavior
- Debug mode for troubleshooting
- Comprehensive documentation

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Jean-Philippe Brule <jp@svrnty.io>
2025-10-29 10:09:01 -04:00

359 lines
5.3 KiB
Markdown

# Installation Guide
Detailed installation instructions for Claude Vision Auto.
## Table of Contents
1. [Prerequisites](#prerequisites)
2. [System Dependencies](#system-dependencies)
3. [Ollama Setup](#ollama-setup)
4. [Package Installation](#package-installation)
5. [Verification](#verification)
6. [Troubleshooting](#troubleshooting)
## Prerequisites
### 1. Claude Code CLI
Install Anthropic's official CLI:
```bash
npm install -g @anthropic-ai/claude-code
```
Verify installation:
```bash
claude --version
```
### 2. Python 3.8+
Check Python version:
```bash
python3 --version
```
If not installed:
```bash
sudo apt-get update
sudo apt-get install python3 python3-pip
```
### 3. Docker (for Ollama)
Install Docker:
```bash
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
```
Log out and back in for group changes to take effect.
## System Dependencies
### Screenshot Tool
Install `scrot` (recommended):
```bash
sudo apt-get update
sudo apt-get install -y scrot
```
Alternative screenshot tools:
```bash
# GNOME Screenshot
sudo apt-get install -y gnome-screenshot
# ImageMagick
sudo apt-get install -y imagemagick
# Maim
sudo apt-get install -y maim xdotool
```
### Additional Dependencies
```bash
sudo apt-get install -y \
python3-pip \
git \
curl
```
## Ollama Setup
### 1. Pull Ollama Docker Image
```bash
docker pull ollama/ollama:latest
```
### 2. Start Ollama Container
```bash
docker run -d \
-p 11434:11434 \
--name ollama \
--restart unless-stopped \
ollama/ollama:latest
```
For GPU support (NVIDIA):
```bash
docker run -d \
-p 11434:11434 \
--name ollama \
--gpus all \
--restart unless-stopped \
ollama/ollama:latest
```
### 3. Pull Vision Model
```bash
# MiniCPM-V (recommended - 5.5GB)
docker exec ollama ollama pull minicpm-v:latest
# Alternative: Llama 3.2 Vision (7.8GB)
docker exec ollama ollama pull llama3.2-vision:latest
# Alternative: LLaVA (4.5GB)
docker exec ollama ollama pull llava:latest
```
### 4. Verify Ollama
```bash
# Check container status
docker ps | grep ollama
# Test API
curl http://localhost:11434/api/tags
# List installed models
curl -s http://localhost:11434/api/tags | python3 -m json.tool
```
## Package Installation
### Method 1: Using Makefile (Recommended)
```bash
cd claude-vision-auto
# Install system dependencies
make deps
# Install package
make install
```
### Method 2: Manual Installation
```bash
cd claude-vision-auto
# Install system dependencies
sudo apt-get update
sudo apt-get install -y scrot python3-pip
# Install Python package
pip3 install -e .
```
### Method 3: From Git
```bash
# Clone repository
git clone https://git.openharbor.io/svrnty/claude-vision-auto.git
cd claude-vision-auto
# Install
pip3 install -e .
```
## Verification
### 1. Check Command Installation
```bash
which claude-vision
```
Expected output: `/home/username/.local/bin/claude-vision`
### 2. Test Ollama Connection
```bash
curl http://localhost:11434/api/tags
```
Should return JSON with list of models.
### 3. Test Screenshot
```bash
scrot /tmp/test_screenshot.png
ls -lh /tmp/test_screenshot.png
```
Should create a screenshot file.
### 4. Run Test
```bash
# Start claude-vision
claude-vision
# You should see:
# [Claude Vision Auto] Testing Ollama connection...
# [Claude Vision Auto] Connected to Ollama
# [Claude Vision Auto] Using model: minicpm-v:latest
```
## Troubleshooting
### "claude-vision: command not found"
Add to PATH in `~/.bashrc` or `~/.zshrc`:
```bash
export PATH="$HOME/.local/bin:$PATH"
```
Then reload:
```bash
source ~/.bashrc # or source ~/.zshrc
```
### "Cannot connect to Ollama"
Check if Ollama container is running:
```bash
docker ps | grep ollama
# If not running, start it:
docker start ollama
```
Check if port 11434 is open:
```bash
netstat -tulpn | grep 11434
# or
ss -tulpn | grep 11434
```
### "Model not found"
Pull the model:
```bash
docker exec ollama ollama pull minicpm-v:latest
```
List available models:
```bash
docker exec ollama ollama list
```
### "Screenshot failed"
Install scrot:
```bash
sudo apt-get install scrot
```
Test screenshot:
```bash
scrot -u /tmp/test.png
```
If error persists, try alternative tools in config:
```bash
export SCREENSHOT_TOOL="gnome-screenshot"
claude-vision
```
### Permission Issues
If pip install fails with permissions:
```bash
# Install for user only
pip3 install --user -e .
# Or use virtual environment
python3 -m venv venv
source venv/bin/activate
pip install -e .
```
### Docker Permission Denied
Add user to docker group:
```bash
sudo usermod -aG docker $USER
```
Log out and back in, then:
```bash
docker ps # Should work without sudo
```
## Uninstallation
### Remove Package
```bash
make uninstall
# or
pip3 uninstall claude-vision-auto
```
### Remove Ollama
```bash
docker stop ollama
docker rm ollama
docker rmi ollama/ollama
```
### Remove System Dependencies
```bash
sudo apt-get remove scrot
```
## Next Steps
After successful installation:
1. Read [USAGE.md](USAGE.md) for usage examples
2. Configure environment variables if needed
3. Test with a simple Claude Code command
## Getting Help
If you encounter issues not covered here:
1. Check the main [README.md](../README.md)
2. Enable debug mode: `DEBUG=true claude-vision`
3. Check logs: `~/.cache/claude-vision-auto/`
4. Report issues: https://git.openharbor.io/svrnty/claude-vision-auto/issues