2.3 KiB
VISION Package Candidate
Status: child-local candidate only. No Core promotion, Seed installation, Runtime start, Profile Exposure, or provider admission is authorized. No wildcard grant is authorized by this document.
Intent
svrnty-vision is the generic visual-perception package candidate for the
canonical Cortex OS sense VISION. It owns tools that inspect or produce pixels,
images, screenshots, browser observations, layouts, charts, diagrams, grounded
regions, segmentations, video frames, or generated/edited images.
research is also under the VISION sense family, but it owns textual/source
reading and research workflows. The boundary is by capability, not by sense name:
Research reads sources; Vision sees media.
Current Route Adapters
| Current route | Candidate tool id | Capability |
|---|---|---|
POST /vlm/analyze |
vision.image_analyze |
Analyze image input with a VLM and return a normalized observation. |
POST /flux/render |
vision.image_generate |
Generate image output through the existing FLUX route. |
POST /palette/extract |
vision.palette_extract |
Extract dominant colors from image input. |
POST /rembg/cutout |
vision.background_cutout |
Remove image background and return cutout output. |
Planned Tool Candidates
The complete VISION visual-perception package should cover:
vision.ocr_readvision.screenshot_observevision.browser_observevision.document_layout_readvision.chart_readvision.table_readvision.diagram_readvision.object_detectvision.visual_groundvision.segmentvision.video_readvision.image_edit
These are not implemented or granted by this slice. They are named so future work has a canonical target and does not duplicate Research capabilities.
Boundary
Owned here:
- Pixel/media perception.
- Visual evidence production.
- Image generation or editing.
- Visual extraction from screenshots, browser views, image files, video frames, charts, diagrams, and layouts.
Not owned here:
- Web search.
- Page fetch.
- PDF text extraction.
- Research synthesis.
- Deep research planning.
- Capsule writing.
- Profile Exposure.
- Runtime startup.
- Provider admission.
Research can consume Visual Evidence only through an explicit handoff contract. Vision never becomes a research synthesizer by returning evidence.