| .. | ||
| artifacts | ||
| fixtures | ||
| reports | ||
| runners | ||
| expectations.yaml | ||
| manifest.yaml | ||
| README.md | ||
CTO Eval Suite
This directory holds the test-first promotion and regression suite for the CTO WebUI coding agent PRD.
The suite is evidence-based: a run is not accepted from prose alone. Scoring must inspect transcripts, diffs, logs, screenshots, approval events, capsule artifacts, and report YAML.
Run the static PRD gate from the Hermes root:
pytest -q tests/e2e/test_j_cto_webui_prd.py
Score all current evidence reports from cto/:
for r in evals/reports/*.yaml; do python3 evals/runners/score.py "$r"; done
Run the deterministic local CTO/WebUI regression execution slice from cto/:
./evals/runners/run-webui-cto.sh
Run the executable promotion-suite readiness gate from cto/:
python3 evals/runners/run-promotion-suite.py
python3 evals/runners/score.py evals/reports/2026-05-25-promotion-suite-readiness.yaml
Run the isolated deterministic fixture execution gate from cto/:
python3 evals/runners/run-promotion-fixtures.py
python3 evals/runners/score.py evals/reports/2026-05-25-promotion-fixture-execution.yaml
Run the live-promotion readiness gate from cto/:
python3 evals/runners/run-live-promotion-readiness.py
python3 evals/runners/score.py evals/reports/2026-05-25-live-promotion-readiness.yaml
Check Codex comparative readiness from cto/:
./evals/runners/run-codex-cli.sh
fixtures/manifest.yaml is the deterministic contract layer for the full PRD
promotion suite. It proves every required eval has a prompt, evidence
expectations, event expectations, and gates. It does not claim live promotion
success or Codex CLI parity.