Svrnty f6e09dbff2 feat(svrnty-vision): Phase 4b complete — full impl + e2e test suite

- palette.py + rembg.py: implement from stubs (Pillow median-cut + rembg u2net)
- vlm.py: rename Spark2→steev (Strix Halo / Ollama); bump max_tokens 1024→4096
  (qwen3-vl:32b thinking mode consumes budget tokens — 4096 min for valid output)
- settings.py: rename spark2_vlm_*/spark1_flux_* → vlm_*/flux_*; real defaults
  (steev 100.88.167.87:11434 Ollama, gx10 100.90.100.10:8188 ComfyUI)
- tests/: conftest.py + test_palette.py + test_rembg.py + test_integration_e2e.py
  (28 unit + 10 integration; 38/38 passing — VLM raw/polished/ugc + FLUX render)
- CLAUDE.md: rewrite to accurate phase status + infra + layout
- requirements.txt + pyproject.toml: add Pillow, rembg, pytest-asyncio deps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-25 06:44:21 -04:00

4.0 KiB

Raw Permalink Blame History

svrnty-vision — orientation for Claude

Inherits Karpathy 4 rules from ~/.claude/CLAUDE.md and the workspace contract from /home/svrnty/workspaces/hermes/CLAUDE.md. Read both before touching anything here.

What this is

Standalone sovereign vision HTTP gateway. Four endpoints, two backends:

Endpoint	Impl	Backend
`POST /vlm/analyze`	HTTP proxy	Qwen3-VL 32B · Ollama · svrnty-steev (Strix Halo) · `100.88.167.87:11434`
`POST /flux/render`	HTTP proxy + poll	FLUX.2-dev · ComfyUI · gx10-f38f · `100.90.100.10:8188`
`POST /palette/extract`	In-process	Pillow median-cut quantization
`POST /rembg/cutout`	In-process	rembg u2net ONNX
`GET /healthz`	Liveness probe	Always 200

Sibling of bte/ — BTE calls it over HTTP via SvrntyVisionGatewayClient. Usable by any agent — no BTE coupling in this repo. Agents can call the endpoints directly (see L4-svrnty.tool-vision in cortex/ for the Go wrapper).

Hard invariants

VLM + FLUX are thin proxies only. No model weights loaded in-process. Pillow + rembg are the only in-process ML.
No cloud providers. Sovereign-first. Anthropic/OpenAI/Google/Higgsfield must never be re-introduced here.
Config via env only. pydantic-settings + .env (gitignored). No hardcoded IPs in code — all in settings.py defaults or overridden by .env.
Port 8092. BTE is configured to call http://localhost:8092.

Phase status (BTE Phase 4 sub-phases)

Phase	Scope	State
4a	FastAPI scaffold + /healthz + 4 route stubs	✅ done
4b	Implement vlm.py, flux.py, palette.py, rembg.py	✅ done (2026-05-25)
4c	Delete .NET vision providers from BTE	✅ done (BTE Phase 4 commit 3112135)
4d	Wire BTE → svrnty-vision via SvrntyVisionGatewayClient	✅ done (BTE Phase 4 commit 3112135)

Infrastructure (Tailscale)

svrnty-steev  100.88.167.87   Strix Halo — Ollama — qwen3-vl:32b (VLM)
gx10-f38f     100.90.100.10   NVIDIA GB10 128GB — ComfyUI v0.18.1 (FLUX)

ComfyUI FLUX.2 model set (gx10):

diffusion_models/flux2_dev_fp8mixed.safetensors
text_encoders/mistral_3_small_flux2_fp8.safetensors
vae/flux2-vae.safetensors

Layout

src/svrnty_vision/
    server.py         # FastAPI app + /healthz + router includes
    settings.py       # pydantic-settings — all config here, no hardcodes
    routers/
        vlm.py        # POST /vlm/analyze   → Ollama (Qwen3-VL 32B)
        flux.py       # POST /flux/render   → ComfyUI (FLUX.2-dev)
        palette.py    # POST /palette/extract  in-process (Pillow)
        rembg.py      # POST /rembg/cutout     in-process (rembg)
tests/
    conftest.py              # fixtures: TestClient, red_png_b64, gradient_png_b64
    test_healthz.py          # liveness + 501 stubs (pre-4b kept for regression)
    test_vlm_parse.py        # pure-function: rubric prompt + score parsing
    test_flux_workflow.py    # pure-function: stopgap FLUX.2 workflow builder
    test_palette.py          # unit: palette extraction (no network)
    test_rembg.py            # unit: background removal (no network)
    test_integration_e2e.py  # live e2e: VLM + FLUX + palette + rembg

Run / test

# Install
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt && pip install -e .

# Serve (reads .env automatically)
uvicorn svrnty_vision.server:app --host 0.0.0.0 --port 8092

# Unit tests (no network)
pytest tests/ -m "not integration"

# Full e2e (requires Tailscale + live Spark hosts)
pytest tests/ -m integration -v

Config (.env)

SVRNTY_VISION_PORT=8092
FLUX_URL=http://100.90.100.10:8188
VLM_URL=http://100.88.167.87:11434
VLM_MODEL=qwen3-vl:32b
VISION_REQUEST_TIMEOUT_SECONDS=120

When extending

New endpoint → new router under routers/, register in server.py, tests in tests/.
New backend → add URL to settings.py + .env.example, never hardcode.
Surgical only. No cross-endpoint refactors while implementing one feature.

4.0 KiB Raw Permalink Blame History