Files
svrnty-vision/docs/VISION-PACKAGE-CANDIDATE.md
T
2026-06-06 08:25:14 -04:00

2.3 KiB

VISION Package Candidate

Status: child-local candidate only. No Core promotion, Seed installation, Runtime start, Profile Exposure, or provider admission is authorized. No wildcard grant is authorized by this document.

Intent

svrnty-vision is the generic visual-perception package candidate for the canonical Cortex OS sense VISION. It owns tools that inspect or produce pixels, images, screenshots, browser observations, layouts, charts, diagrams, grounded regions, segmentations, video frames, or generated/edited images.

research is also under the VISION sense family, but it owns textual/source reading and research workflows. The boundary is by capability, not by sense name: Research reads sources; Vision sees media.

Current Route Adapters

Current route Candidate tool id Capability
POST /vlm/analyze vision.image_analyze Analyze image input with a VLM and return a normalized observation.
POST /flux/render vision.image_generate Generate image output through the existing FLUX route.
POST /palette/extract vision.palette_extract Extract dominant colors from image input.
POST /rembg/cutout vision.background_cutout Remove image background and return cutout output.

Planned Tool Candidates

The complete VISION visual-perception package should cover:

  • vision.ocr_read
  • vision.screenshot_observe
  • vision.browser_observe
  • vision.document_layout_read
  • vision.chart_read
  • vision.table_read
  • vision.diagram_read
  • vision.object_detect
  • vision.visual_ground
  • vision.segment
  • vision.video_read
  • vision.image_edit

These are not implemented or granted by this slice. They are named so future work has a canonical target and does not duplicate Research capabilities.

Boundary

Owned here:

  • Pixel/media perception.
  • Visual evidence production.
  • Image generation or editing.
  • Visual extraction from screenshots, browser views, image files, video frames, charts, diagrams, and layouts.

Not owned here:

  • Web search.
  • Page fetch.
  • PDF text extraction.
  • Research synthesis.
  • Deep research planning.
  • Capsule writing.
  • Profile Exposure.
  • Runtime startup.
  • Provider admission.

Research can consume Visual Evidence only through an explicit handoff contract. Vision never becomes a research synthesizer by returning evidence.