Upgrade CTO webui coding profile

2026-05-25 12:57:33 -04:00
parent 0ca5ffc8ed
commit 4ed306928a
40 changed files with 3435 additions and 113 deletions
@@ -0,0 +1,51 @@
+# CTO Eval Suite
+
+This directory holds the test-first promotion and regression suite for the CTO
+WebUI coding agent PRD.
+
+The suite is evidence-based: a run is not accepted from prose alone. Scoring
+must inspect transcripts, diffs, logs, screenshots, approval events, capsule
+artifacts, and report YAML.
+
+Run the static PRD gate from the Hermes root:
+
+```bash
+pytest -q tests/e2e/test_j_cto_webui_prd.py
+```
+
+Score all current evidence reports from `cto/`:
+
+```bash
+for r in evals/reports/*.yaml; do python3 evals/runners/score.py "$r"; done
+```
+
+Run the deterministic local CTO/WebUI regression execution slice from `cto/`:
+
+```bash
+./evals/runners/run-webui-cto.sh
+```
+
+Run the executable promotion-suite readiness gate from `cto/`:
+
+```bash
+python3 evals/runners/run-promotion-suite.py
+python3 evals/runners/score.py evals/reports/2026-05-25-promotion-suite-readiness.yaml
+```
+
+Run the isolated deterministic fixture execution gate from `cto/`:
+
+```bash
+python3 evals/runners/run-promotion-fixtures.py
+python3 evals/runners/score.py evals/reports/2026-05-25-promotion-fixture-execution.yaml
+```
+
+Check Codex comparative readiness from `cto/`:
+
+```bash
+./evals/runners/run-codex-cli.sh
+```
+
+`fixtures/manifest.yaml` is the deterministic contract layer for the full PRD
+promotion suite. It proves every required eval has a prompt, evidence
+expectations, event expectations, and gates. It does not claim live promotion
+success or Codex CLI parity.