Files
svrnty-vision/docs/VISION-PACKAGE-CANDIDATE.md
T
2026-06-06 08:25:14 -04:00

71 lines
2.3 KiB
Markdown

# VISION Package Candidate
Status: child-local candidate only. No Core promotion, Seed installation,
Runtime start, Profile Exposure, or provider admission is authorized. No
wildcard grant is authorized by this document.
## Intent
`svrnty-vision` is the generic visual-perception package candidate for the
canonical Cortex OS sense `VISION`. It owns tools that inspect or produce pixels,
images, screenshots, browser observations, layouts, charts, diagrams, grounded
regions, segmentations, video frames, or generated/edited images.
`research` is also under the `VISION` sense family, but it owns textual/source
reading and research workflows. The boundary is by capability, not by sense name:
Research reads sources; Vision sees media.
## Current Route Adapters
| Current route | Candidate tool id | Capability |
| --- | --- | --- |
| `POST /vlm/analyze` | `vision.image_analyze` | Analyze image input with a VLM and return a normalized observation. |
| `POST /flux/render` | `vision.image_generate` | Generate image output through the existing FLUX route. |
| `POST /palette/extract` | `vision.palette_extract` | Extract dominant colors from image input. |
| `POST /rembg/cutout` | `vision.background_cutout` | Remove image background and return cutout output. |
## Planned Tool Candidates
The complete VISION visual-perception package should cover:
- `vision.ocr_read`
- `vision.screenshot_observe`
- `vision.browser_observe`
- `vision.document_layout_read`
- `vision.chart_read`
- `vision.table_read`
- `vision.diagram_read`
- `vision.object_detect`
- `vision.visual_ground`
- `vision.segment`
- `vision.video_read`
- `vision.image_edit`
These are not implemented or granted by this slice. They are named so future
work has a canonical target and does not duplicate Research capabilities.
## Boundary
Owned here:
- Pixel/media perception.
- Visual evidence production.
- Image generation or editing.
- Visual extraction from screenshots, browser views, image files, video frames,
charts, diagrams, and layouts.
Not owned here:
- Web search.
- Page fetch.
- PDF text extraction.
- Research synthesis.
- Deep research planning.
- Capsule writing.
- Profile Exposure.
- Runtime startup.
- Provider admission.
Research can consume Visual Evidence only through an explicit handoff contract.
Vision never becomes a research synthesizer by returning evidence.