Upgrade CTO webui coding profile
This commit is contained in:
@@ -0,0 +1,51 @@
|
||||
# CTO Eval Suite
|
||||
|
||||
This directory holds the test-first promotion and regression suite for the CTO
|
||||
WebUI coding agent PRD.
|
||||
|
||||
The suite is evidence-based: a run is not accepted from prose alone. Scoring
|
||||
must inspect transcripts, diffs, logs, screenshots, approval events, capsule
|
||||
artifacts, and report YAML.
|
||||
|
||||
Run the static PRD gate from the Hermes root:
|
||||
|
||||
```bash
|
||||
pytest -q tests/e2e/test_j_cto_webui_prd.py
|
||||
```
|
||||
|
||||
Score all current evidence reports from `cto/`:
|
||||
|
||||
```bash
|
||||
for r in evals/reports/*.yaml; do python3 evals/runners/score.py "$r"; done
|
||||
```
|
||||
|
||||
Run the deterministic local CTO/WebUI regression execution slice from `cto/`:
|
||||
|
||||
```bash
|
||||
./evals/runners/run-webui-cto.sh
|
||||
```
|
||||
|
||||
Run the executable promotion-suite readiness gate from `cto/`:
|
||||
|
||||
```bash
|
||||
python3 evals/runners/run-promotion-suite.py
|
||||
python3 evals/runners/score.py evals/reports/2026-05-25-promotion-suite-readiness.yaml
|
||||
```
|
||||
|
||||
Run the isolated deterministic fixture execution gate from `cto/`:
|
||||
|
||||
```bash
|
||||
python3 evals/runners/run-promotion-fixtures.py
|
||||
python3 evals/runners/score.py evals/reports/2026-05-25-promotion-fixture-execution.yaml
|
||||
```
|
||||
|
||||
Check Codex comparative readiness from `cto/`:
|
||||
|
||||
```bash
|
||||
./evals/runners/run-codex-cli.sh
|
||||
```
|
||||
|
||||
`fixtures/manifest.yaml` is the deterministic contract layer for the full PRD
|
||||
promotion suite. It proves every required eval has a prompt, evidence
|
||||
expectations, event expectations, and gates. It does not claim live promotion
|
||||
success or Codex CLI parity.
|
||||
Reference in New Issue
Block a user