svrnty-vision/CLAUDE.md
Svrnty f6e09dbff2 feat(svrnty-vision): Phase 4b complete — full impl + e2e test suite
- palette.py + rembg.py: implement from stubs (Pillow median-cut + rembg u2net)
- vlm.py: rename Spark2→steev (Strix Halo / Ollama); bump max_tokens 1024→4096
  (qwen3-vl:32b thinking mode consumes budget tokens — 4096 min for valid output)
- settings.py: rename spark2_vlm_*/spark1_flux_* → vlm_*/flux_*; real defaults
  (steev 100.88.167.87:11434 Ollama, gx10 100.90.100.10:8188 ComfyUI)
- tests/: conftest.py + test_palette.py + test_rembg.py + test_integration_e2e.py
  (28 unit + 10 integration; 38/38 passing — VLM raw/polished/ugc + FLUX render)
- CLAUDE.md: rewrite to accurate phase status + infra + layout
- requirements.txt + pyproject.toml: add Pillow, rembg, pytest-asyncio deps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 06:44:21 -04:00

107 lines
4.0 KiB
Markdown

# svrnty-vision — orientation for Claude
*Inherits Karpathy 4 rules from `~/.claude/CLAUDE.md` and the workspace
contract from `/home/svrnty/workspaces/hermes/CLAUDE.md`. Read both before
touching anything here.*
## What this is
Standalone sovereign vision HTTP gateway. Four endpoints, two backends:
| Endpoint | Impl | Backend |
|---|---|---|
| `POST /vlm/analyze` | HTTP proxy | Qwen3-VL 32B · Ollama · svrnty-steev (Strix Halo) · `100.88.167.87:11434` |
| `POST /flux/render` | HTTP proxy + poll | FLUX.2-dev · ComfyUI · gx10-f38f · `100.90.100.10:8188` |
| `POST /palette/extract` | In-process | Pillow median-cut quantization |
| `POST /rembg/cutout` | In-process | rembg u2net ONNX |
| `GET /healthz` | Liveness probe | Always 200 |
**Sibling of `bte/`** — BTE calls it over HTTP via `SvrntyVisionGatewayClient`.
**Usable by any agent** — no BTE coupling in this repo. Agents can call the
endpoints directly (see `L4-svrnty.tool-vision` in cortex/ for the Go wrapper).
## Hard invariants
- **VLM + FLUX are thin proxies only.** No model weights loaded in-process.
Pillow + rembg are the only in-process ML.
- **No cloud providers.** Sovereign-first. Anthropic/OpenAI/Google/Higgsfield
must never be re-introduced here.
- **Config via env only.** pydantic-settings + `.env` (gitignored). No
hardcoded IPs in code — all in settings.py defaults or overridden by `.env`.
- **Port 8092.** BTE is configured to call `http://localhost:8092`.
## Phase status (BTE Phase 4 sub-phases)
| Phase | Scope | State |
|---|---|---|
| 4a | FastAPI scaffold + /healthz + 4 route stubs | ✅ done |
| 4b | Implement vlm.py, flux.py, palette.py, rembg.py | ✅ done (2026-05-25) |
| 4c | Delete .NET vision providers from BTE | ✅ done (BTE Phase 4 commit 3112135) |
| 4d | Wire BTE → svrnty-vision via SvrntyVisionGatewayClient | ✅ done (BTE Phase 4 commit 3112135) |
## Infrastructure (Tailscale)
```
svrnty-steev 100.88.167.87 Strix Halo — Ollama — qwen3-vl:32b (VLM)
gx10-f38f 100.90.100.10 NVIDIA GB10 128GB — ComfyUI v0.18.1 (FLUX)
```
**ComfyUI FLUX.2 model set (gx10):**
- `diffusion_models/flux2_dev_fp8mixed.safetensors`
- `text_encoders/mistral_3_small_flux2_fp8.safetensors`
- `vae/flux2-vae.safetensors`
## Layout
```
src/svrnty_vision/
server.py # FastAPI app + /healthz + router includes
settings.py # pydantic-settings — all config here, no hardcodes
routers/
vlm.py # POST /vlm/analyze → Ollama (Qwen3-VL 32B)
flux.py # POST /flux/render → ComfyUI (FLUX.2-dev)
palette.py # POST /palette/extract in-process (Pillow)
rembg.py # POST /rembg/cutout in-process (rembg)
tests/
conftest.py # fixtures: TestClient, red_png_b64, gradient_png_b64
test_healthz.py # liveness + 501 stubs (pre-4b kept for regression)
test_vlm_parse.py # pure-function: rubric prompt + score parsing
test_flux_workflow.py # pure-function: stopgap FLUX.2 workflow builder
test_palette.py # unit: palette extraction (no network)
test_rembg.py # unit: background removal (no network)
test_integration_e2e.py # live e2e: VLM + FLUX + palette + rembg
```
## Run / test
```sh
# Install
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt && pip install -e .
# Serve (reads .env automatically)
uvicorn svrnty_vision.server:app --host 0.0.0.0 --port 8092
# Unit tests (no network)
pytest tests/ -m "not integration"
# Full e2e (requires Tailscale + live Spark hosts)
pytest tests/ -m integration -v
```
## Config (.env)
```
SVRNTY_VISION_PORT=8092
FLUX_URL=http://100.90.100.10:8188
VLM_URL=http://100.88.167.87:11434
VLM_MODEL=qwen3-vl:32b
VISION_REQUEST_TIMEOUT_SECONDS=120
```
## When extending
- New endpoint → new router under `routers/`, register in `server.py`, tests in `tests/`.
- New backend → add URL to `settings.py` + `.env.example`, never hardcode.
- Surgical only. No cross-endpoint refactors while implementing one feature.