# svrnty-vision — orientation for Claude *Inherits Karpathy 4 rules from `~/.claude/CLAUDE.md` and the workspace contract from `/home/svrnty/workspaces/hermes/CLAUDE.md`. Read both before touching anything here.* ## What this is Standalone sovereign vision HTTP gateway. Four endpoints, two backends: | Endpoint | Impl | Backend | |---|---|---| | `POST /vlm/analyze` | HTTP proxy | Qwen3-VL 32B · Ollama · svrnty-steev (Strix Halo) · `100.88.167.87:11434` | | `POST /flux/render` | HTTP proxy + poll | FLUX.2-dev · ComfyUI · gx10-f38f · `100.90.100.10:8188` | | `POST /palette/extract` | In-process | Pillow median-cut quantization | | `POST /rembg/cutout` | In-process | rembg u2net ONNX | | `GET /healthz` | Liveness probe | Always 200 | **Sibling of `bte/`** — BTE calls it over HTTP via `SvrntyVisionGatewayClient`. **Usable by any agent** — no BTE coupling in this repo. Agents can call the endpoints directly (see `L4-svrnty.tool-vision` in cortex/ for the Go wrapper). ## Hard invariants - **VLM + FLUX are thin proxies only.** No model weights loaded in-process. Pillow + rembg are the only in-process ML. - **No cloud providers.** Sovereign-first. Anthropic/OpenAI/Google/Higgsfield must never be re-introduced here. - **Config via env only.** pydantic-settings + `.env` (gitignored). No hardcoded IPs in code — all in settings.py defaults or overridden by `.env`. - **Port 8092.** BTE is configured to call `http://localhost:8092`. ## Phase status (BTE Phase 4 sub-phases) | Phase | Scope | State | |---|---|---| | 4a | FastAPI scaffold + /healthz + 4 route stubs | ✅ done | | 4b | Implement vlm.py, flux.py, palette.py, rembg.py | ✅ done (2026-05-25) | | 4c | Delete .NET vision providers from BTE | ✅ done (BTE Phase 4 commit 3112135) | | 4d | Wire BTE → svrnty-vision via SvrntyVisionGatewayClient | ✅ done (BTE Phase 4 commit 3112135) | ## Infrastructure (Tailscale) ``` svrnty-steev 100.88.167.87 Strix Halo — Ollama — qwen3-vl:32b (VLM) gx10-f38f 100.90.100.10 NVIDIA GB10 128GB — ComfyUI v0.18.1 (FLUX) ``` **ComfyUI FLUX.2 model set (gx10):** - `diffusion_models/flux2_dev_fp8mixed.safetensors` - `text_encoders/mistral_3_small_flux2_fp8.safetensors` - `vae/flux2-vae.safetensors` ## Layout ``` src/svrnty_vision/ server.py # FastAPI app + /healthz + router includes settings.py # pydantic-settings — all config here, no hardcodes routers/ vlm.py # POST /vlm/analyze → Ollama (Qwen3-VL 32B) flux.py # POST /flux/render → ComfyUI (FLUX.2-dev) palette.py # POST /palette/extract in-process (Pillow) rembg.py # POST /rembg/cutout in-process (rembg) tests/ conftest.py # fixtures: TestClient, red_png_b64, gradient_png_b64 test_healthz.py # liveness + 501 stubs (pre-4b kept for regression) test_vlm_parse.py # pure-function: rubric prompt + score parsing test_flux_workflow.py # pure-function: stopgap FLUX.2 workflow builder test_palette.py # unit: palette extraction (no network) test_rembg.py # unit: background removal (no network) test_integration_e2e.py # live e2e: VLM + FLUX + palette + rembg ``` ## Run / test ```sh # Install python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt && pip install -e . # Serve (reads .env automatically) uvicorn svrnty_vision.server:app --host 0.0.0.0 --port 8092 # Unit tests (no network) pytest tests/ -m "not integration" # Full e2e (requires Tailscale + live Spark hosts) pytest tests/ -m integration -v ``` ## Config (.env) ``` SVRNTY_VISION_PORT=8092 FLUX_URL=http://100.90.100.10:8188 VLM_URL=http://100.88.167.87:11434 VLM_MODEL=qwen3-vl:32b VISION_REQUEST_TIMEOUT_SECONDS=120 ``` ## When extending - New endpoint → new router under `routers/`, register in `server.py`, tests in `tests/`. - New backend → add URL to `settings.py` + `.env.example`, never hardcode. - Surgical only. No cross-endpoint refactors while implementing one feature.