svrnty-vision/CLAUDE.md

# svrnty-vision — orientation for Claude

*Inherits Karpathy 4 rules from `~/.claude/CLAUDE.md` and the workspace
contract from `/home/svrnty/workspaces/hermes/CLAUDE.md`. Read both before
touching anything here.*

## What this is

Standalone sovereign vision HTTP gateway. Four endpoints, two backends:

| Endpoint | Impl | Backend |
|---|---|---|
| `POST /vlm/analyze` | HTTP proxy | Qwen3-VL 32B · Ollama · svrnty-steev (Strix Halo) · `100.88.167.87:11434` |
| `POST /flux/render` | HTTP proxy + poll | FLUX.2-dev · ComfyUI · gx10-f38f · `100.90.100.10:8188` |
| `POST /palette/extract` | In-process | Pillow median-cut quantization |
| `POST /rembg/cutout` | In-process | rembg u2net ONNX |
| `GET /healthz` | Liveness probe | Always 200 |

**Sibling of `bte/`** — BTE calls it over HTTP via `SvrntyVisionGatewayClient`.
**Usable by any agent** — no BTE coupling in this repo. Agents can call the
endpoints directly (see `L4-svrnty.tool-vision` in cortex/ for the Go wrapper).

## Hard invariants

- **VLM + FLUX are thin proxies only.** No model weights loaded in-process.
  Pillow + rembg are the only in-process ML.
- **No cloud providers.** Sovereign-first. Anthropic/OpenAI/Google/Higgsfield
  must never be re-introduced here.
- **Config via env only.** pydantic-settings + `.env` (gitignored). No
  hardcoded IPs in code — all in settings.py defaults or overridden by `.env`.
- **Port 8092.** BTE is configured to call `http://localhost:8092`.

## Phase status (BTE Phase 4 sub-phases)

| Phase | Scope | State |
|---|---|---|
| 4a | FastAPI scaffold + /healthz + 4 route stubs | ✅ done |
| 4b | Implement vlm.py, flux.py, palette.py, rembg.py | ✅ done (2026-05-25) |
| 4c | Delete .NET vision providers from BTE | ✅ done (BTE Phase 4 commit 3112135) |
| 4d | Wire BTE → svrnty-vision via SvrntyVisionGatewayClient | ✅ done (BTE Phase 4 commit 3112135) |

## Infrastructure (Tailscale)

```
svrnty-steev  100.88.167.87   Strix Halo — Ollama — qwen3-vl:32b (VLM)
gx10-f38f     100.90.100.10   NVIDIA GB10 128GB — ComfyUI v0.18.1 (FLUX)
```

**ComfyUI FLUX.2 model set (gx10):**
- `diffusion_models/flux2_dev_fp8mixed.safetensors`
- `text_encoders/mistral_3_small_flux2_fp8.safetensors`
- `vae/flux2-vae.safetensors`

## Layout

```
src/svrnty_vision/
    server.py         # FastAPI app + /healthz + router includes
    settings.py       # pydantic-settings — all config here, no hardcodes
    routers/
        vlm.py        # POST /vlm/analyze   → Ollama (Qwen3-VL 32B)
        flux.py       # POST /flux/render   → ComfyUI (FLUX.2-dev)
        palette.py    # POST /palette/extract  in-process (Pillow)
        rembg.py      # POST /rembg/cutout     in-process (rembg)
tests/
    conftest.py              # fixtures: TestClient, red_png_b64, gradient_png_b64
    test_healthz.py          # liveness + 501 stubs (pre-4b kept for regression)
    test_vlm_parse.py        # pure-function: rubric prompt + score parsing
    test_flux_workflow.py    # pure-function: stopgap FLUX.2 workflow builder
    test_palette.py          # unit: palette extraction (no network)
    test_rembg.py            # unit: background removal (no network)
    test_integration_e2e.py  # live e2e: VLM + FLUX + palette + rembg
```

## Run / test

```sh
# Install
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt && pip install -e .

# Serve (reads .env automatically)
uvicorn svrnty_vision.server:app --host 0.0.0.0 --port 8092

# Unit tests (no network)
pytest tests/ -m "not integration"

# Full e2e (requires Tailscale + live Spark hosts)
pytest tests/ -m integration -v
```

## Config (.env)

```
SVRNTY_VISION_PORT=8092
FLUX_URL=http://100.90.100.10:8188
VLM_URL=http://100.88.167.87:11434
VLM_MODEL=qwen3-vl:32b
VISION_REQUEST_TIMEOUT_SECONDS=120
```

## When extending

- New endpoint → new router under `routers/`, register in `server.py`, tests in `tests/`.
- New backend → add URL to `settings.py` + `.env.example`, never hardcode.
- Surgical only. No cross-endpoint refactors while implementing one feature.