- palette.py + rembg.py: implement from stubs (Pillow median-cut + rembg u2net) - vlm.py: rename Spark2→steev (Strix Halo / Ollama); bump max_tokens 1024→4096 (qwen3-vl:32b thinking mode consumes budget tokens — 4096 min for valid output) - settings.py: rename spark2_vlm_*/spark1_flux_* → vlm_*/flux_*; real defaults (steev 100.88.167.87:11434 Ollama, gx10 100.90.100.10:8188 ComfyUI) - tests/: conftest.py + test_palette.py + test_rembg.py + test_integration_e2e.py (28 unit + 10 integration; 38/38 passing — VLM raw/polished/ugc + FLUX render) - CLAUDE.md: rewrite to accurate phase status + infra + layout - requirements.txt + pyproject.toml: add Pillow, rembg, pytest-asyncio deps Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.0 KiB
4.0 KiB
svrnty-vision — orientation for Claude
Inherits Karpathy 4 rules from ~/.claude/CLAUDE.md and the workspace
contract from /home/svrnty/workspaces/hermes/CLAUDE.md. Read both before
touching anything here.
What this is
Standalone sovereign vision HTTP gateway. Four endpoints, two backends:
| Endpoint | Impl | Backend |
|---|---|---|
POST /vlm/analyze |
HTTP proxy | Qwen3-VL 32B · Ollama · svrnty-steev (Strix Halo) · 100.88.167.87:11434 |
POST /flux/render |
HTTP proxy + poll | FLUX.2-dev · ComfyUI · gx10-f38f · 100.90.100.10:8188 |
POST /palette/extract |
In-process | Pillow median-cut quantization |
POST /rembg/cutout |
In-process | rembg u2net ONNX |
GET /healthz |
Liveness probe | Always 200 |
Sibling of bte/ — BTE calls it over HTTP via SvrntyVisionGatewayClient.
Usable by any agent — no BTE coupling in this repo. Agents can call the
endpoints directly (see L4-svrnty.tool-vision in cortex/ for the Go wrapper).
Hard invariants
- VLM + FLUX are thin proxies only. No model weights loaded in-process. Pillow + rembg are the only in-process ML.
- No cloud providers. Sovereign-first. Anthropic/OpenAI/Google/Higgsfield must never be re-introduced here.
- Config via env only. pydantic-settings +
.env(gitignored). No hardcoded IPs in code — all in settings.py defaults or overridden by.env. - Port 8092. BTE is configured to call
http://localhost:8092.
Phase status (BTE Phase 4 sub-phases)
| Phase | Scope | State |
|---|---|---|
| 4a | FastAPI scaffold + /healthz + 4 route stubs | ✅ done |
| 4b | Implement vlm.py, flux.py, palette.py, rembg.py | ✅ done (2026-05-25) |
| 4c | Delete .NET vision providers from BTE | ✅ done (BTE Phase 4 commit 3112135) |
| 4d | Wire BTE → svrnty-vision via SvrntyVisionGatewayClient | ✅ done (BTE Phase 4 commit 3112135) |
Infrastructure (Tailscale)
svrnty-steev 100.88.167.87 Strix Halo — Ollama — qwen3-vl:32b (VLM)
gx10-f38f 100.90.100.10 NVIDIA GB10 128GB — ComfyUI v0.18.1 (FLUX)
ComfyUI FLUX.2 model set (gx10):
diffusion_models/flux2_dev_fp8mixed.safetensorstext_encoders/mistral_3_small_flux2_fp8.safetensorsvae/flux2-vae.safetensors
Layout
src/svrnty_vision/
server.py # FastAPI app + /healthz + router includes
settings.py # pydantic-settings — all config here, no hardcodes
routers/
vlm.py # POST /vlm/analyze → Ollama (Qwen3-VL 32B)
flux.py # POST /flux/render → ComfyUI (FLUX.2-dev)
palette.py # POST /palette/extract in-process (Pillow)
rembg.py # POST /rembg/cutout in-process (rembg)
tests/
conftest.py # fixtures: TestClient, red_png_b64, gradient_png_b64
test_healthz.py # liveness + 501 stubs (pre-4b kept for regression)
test_vlm_parse.py # pure-function: rubric prompt + score parsing
test_flux_workflow.py # pure-function: stopgap FLUX.2 workflow builder
test_palette.py # unit: palette extraction (no network)
test_rembg.py # unit: background removal (no network)
test_integration_e2e.py # live e2e: VLM + FLUX + palette + rembg
Run / test
# Install
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt && pip install -e .
# Serve (reads .env automatically)
uvicorn svrnty_vision.server:app --host 0.0.0.0 --port 8092
# Unit tests (no network)
pytest tests/ -m "not integration"
# Full e2e (requires Tailscale + live Spark hosts)
pytest tests/ -m integration -v
Config (.env)
SVRNTY_VISION_PORT=8092
FLUX_URL=http://100.90.100.10:8188
VLM_URL=http://100.88.167.87:11434
VLM_MODEL=qwen3-vl:32b
VISION_REQUEST_TIMEOUT_SECONDS=120
When extending
- New endpoint → new router under
routers/, register inserver.py, tests intests/. - New backend → add URL to
settings.py+.env.example, never hardcode. - Surgical only. No cross-endpoint refactors while implementing one feature.