# CTO Eval Suite This directory holds the test-first promotion and regression suite for the CTO WebUI coding agent PRD. The suite is evidence-based: a run is not accepted from prose alone. Scoring must inspect transcripts, diffs, logs, screenshots, approval events, capsule artifacts, and report YAML. Run the static PRD gate from the Hermes root: ```bash pytest -q tests/e2e/test_j_cto_webui_prd.py ``` Score all current evidence reports from `cto/`: ```bash for r in evals/reports/*.yaml; do python3 evals/runners/score.py "$r"; done ``` Run the deterministic local CTO/WebUI regression execution slice from `cto/`: ```bash ./evals/runners/run-webui-cto.sh ``` Run the executable promotion-suite readiness gate from `cto/`: ```bash python3 evals/runners/run-promotion-suite.py python3 evals/runners/score.py evals/reports/2026-05-25-promotion-suite-readiness.yaml ``` Run the isolated deterministic fixture execution gate from `cto/`: ```bash python3 evals/runners/run-promotion-fixtures.py python3 evals/runners/score.py evals/reports/2026-05-25-promotion-fixture-execution.yaml ``` Check Codex comparative readiness from `cto/`: ```bash ./evals/runners/run-codex-cli.sh ``` `fixtures/manifest.yaml` is the deterministic contract layer for the full PRD promotion suite. It proves every required eval has a prompt, evidence expectations, event expectations, and gates. It does not claim live promotion success or Codex CLI parity.