Record Case OpenAI compatibility evidence

This commit is contained in:
Svrnty 2026-05-31 22:50:42 -04:00
parent b4d2ca2709
commit 9d3d988983
4 changed files with 128 additions and 0 deletions

View File

@ -258,6 +258,90 @@ Latest evidence:
- Run artifact directory: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T023119Z-r1-string-slugify-2759949`.
- Report status: `blocked`.
## Response Probe Budget Correction - 2026-06-01
Hermes commit `bbe7c72 Use realistic Case local response probe` corrected the
local response-shape probe budget.
Observed:
- Spark vLLM returned reasoning-only output at a very small probe budget.
- The same route returned assistant content at realistic Case-sized budgets.
- The Hermes probe now uses a larger budget before classifying a provider as reasoning-only.
Evidence:
- Focused validator passed with `local_provider_delayed_content_allows_case`.
- Real Case Qwen loop artifact: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T023532Z-r1-string-slugify-2776187`.
- Report status: `fail`.
- Failure reason: `case agent result protocol failed`.
- Case process started: `true`.
- Case model provider: `qwen-local`.
- Case model: `qwen3.6-35b-a3b`.
- Result: response shape is no longer the active blocker for this route.
## OpenAI-Compatible Runtime Bridge Evidence - 2026-06-01
Hermes commit `5c5448b Bridge Case Qwen through OpenAI-compatible runtime`
adds a non-vendor compatibility route for Case/Pi local model execution.
The CTO admission identity remains `qwen-local` / `qwen3.6-35b-a3b`. The Case
runtime provider identity is mapped to Pi's built-in `openai` provider only
inside the harness-owned Case process environment. The Spark endpoint value is
supplied only at runtime and is not recorded in SOT.
Evidence:
- Focused validator passed: `python3 harness/runner/validate-case-provider-adapter.py --harness-root harness --json`.
- Focused validator artifact: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T024443Z-r1-string-slugify-2817037`.
- Focused validator includes `qwen_local_openai_compat_allows_case`.
- Post-merge aggregate validator passed: `harness/evals/health.sh --json`.
- Post-merge provider-adapter artifact: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T024714Z-r1-string-slugify-2832755`.
- Post-merge matrix artifact: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T024712Z-run-all-fake-2832397`.
- Real Case Qwen loop artifact: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T024456Z-r1-string-slugify-2819659`.
- Real run report status: `fail`.
- Real run failure reason: `case engine failed with exit code 124`.
- Case process started: `true`.
- Case model provider: `qwen-local`.
- Case model: `qwen3.6-35b-a3b`.
- Case runtime model provider: `openai`.
- Case model admission status: `admitted`.
- Source admission status: `not_admitted`.
- Tests command: `python3 -m pytest -q`.
- Tests passed: `true`.
- Patch artifact: `patch.diff`.
- Patch digest: `4706a667d3e66f3a9a00da37d274263c5ab776b0cce0971f7ac4efc5f341da54`.
- The committed fixture diff was captured after the harness learned to diff from the fixture baseline commit.
- Result: Case can reach Spark through the compatibility route and produce a valid artificial-fixture patch, but Stage 2 is not validated because Case timed out before a clean Harness Evidence Interface pass.
## CTO-WORK-032 - Case Lifecycle Timeout After Valid Patch
Status: blocked.
The current active blocker is no longer provider admission, endpoint reachability,
response shape, or absence of a patch. The active blocker is lifecycle completion
after Case has produced a valid patch and passing tests.
Acceptance:
- Real Case Stage 2 remains blocked until Case exits cleanly and the harness emits a pass report.
- Evidence must show a non-empty allowed diff from the artificial fixture baseline.
- Evidence must show the fixture tests pass.
- Evidence must show required events pass through the Harness Evidence Interface.
- Evidence must show no Target Repository path was inspected or copied.
- Evidence must preserve admitted provider identity as `qwen-local` / `qwen3.6-35b-a3b`.
- Evidence may use the harness-owned OpenAI-compatible runtime bridge, but must not promote `openai` as CTO admission identity.
- Timeout-after-valid-patch evidence must remain fail evidence, not pass evidence.
- No copied-repo, sandbox-repo, owned-repo, default-candidate, or Core promotion stage may use timeout evidence as pass evidence.
Required next route:
- Keep the OpenAI-compatible bridge behind the Hermes CTO Harness seam.
- Add or adjust only harness-side lifecycle control outside vendor Case source.
- Prefer a minimal fix that makes Case stop after the required patch, test, commit, and `AGENT_RESULT` envelope.
- If timeout persists after valid patch/tests, classify it explicitly as lifecycle timeout after valid patch.
- Do not mark Stage 2 validated without a clean pass report.
- Failure reason: `provider response shape unavailable`.
- Marker: `backend/provider-reasoning-only.txt`.
- Case process started: `false`.

View File

@ -157,6 +157,27 @@ Validation Evidence:
- `CTO-WORK-016` remains blocked because no real Case Stage 2 pass report exists.
- Current downstream blocker returns to `CTO-WORK-028`.
## OpenAI-Compatible Runtime Bridge Evidence - 2026-06-01
- Hermes commit: `5c5448b Bridge Case Qwen through OpenAI-compatible runtime`.
- The harness preserves CTO admission identity as `qwen-local` / `qwen3.6-35b-a3b`.
- The harness maps the Case runtime provider to Pi's built-in `openai` provider only inside the harness-owned Case process.
- The Spark endpoint value was supplied only through runtime environment and is not recorded in SOT.
- Focused validator passed with `qwen_local_openai_compat_allows_case`.
- Focused validator artifact: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T024443Z-r1-string-slugify-2817037`.
- Post-merge Hermes health passed.
- Post-merge provider-adapter artifact: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T024714Z-r1-string-slugify-2832755`.
- Real Case Stage 2 retry artifact: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T024456Z-r1-string-slugify-2819659`.
- Report status was `fail`.
- Failure reason was `case engine failed with exit code 124`.
- Case process started was `true`.
- Case runtime model provider was `openai`.
- Tests passed.
- Patch artifact was non-empty.
- Patch digest was `4706a667d3e66f3a9a00da37d274263c5ab776b0cce0971f7ac4efc5f341da54`.
- `CTO-WORK-016` remains blocked because no clean real Case Stage 2 pass report exists.
- Current downstream blocker is `CTO-WORK-032`.
## Isolated Pi Config Runtime Evidence - 2026-06-01
- Hermes commit: `09b5851 Isolate Case Pi provider config`.

View File

@ -104,6 +104,24 @@ Current evidence:
- Case model admission status: `admitted`.
- Result: Spark endpoint availability is no longer the current unknown; Stage 2 remains blocked by the Case agent-result protocol seam.
## Spark OpenAI-Compatible Runtime Bridge Evidence - 2026-06-01
- Hermes commit: `5c5448b Bridge Case Qwen through OpenAI-compatible runtime`.
- The harness reached Spark through an OpenAI-compatible Case runtime provider bridge.
- CTO admission identity stayed `qwen-local` / `qwen3.6-35b-a3b`.
- Runtime endpoint value was supplied only through environment and is not recorded in SOT.
- Focused validator artifact: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T024443Z-r1-string-slugify-2817037`.
- Real Case Qwen loop artifact: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T024456Z-r1-string-slugify-2819659`.
- Report status: `fail`.
- Failure reason: `case engine failed with exit code 124`.
- Case process started: `true`.
- Case runtime model provider: `openai`.
- Tests passed: `true`.
- Patch artifact was non-empty.
- This proves endpoint config and runtime provider bridging are sufficient for Case to produce a fixture patch.
- This does not validate `CTO-WORK-016`, `CTO-WORK-020`, `CTO-WORK-022`, `CTO-WORK-028`, or `CTO-WORK-032`.
- Current active blocker is Case lifecycle timeout after valid patch evidence.
## Hermes Case Qwen Loop Evidence - 2026-06-01
- Hermes commit: `6c453ee Add Case Qwen loop entrypoint`.

View File

@ -155,3 +155,8 @@ items:
status: blocked
source: .sot/03-PROTOCOLS/CTO-CASE-AGENT-PROTOCOL-BLOCKER.md
owner: jp
- id: CTO-WORK-032
title: Case Lifecycle Timeout After Valid Patch
status: blocked
source: .sot/03-PROTOCOLS/CTO-CASE-AGENT-PROTOCOL-BLOCKER.md
owner: jp