Record Case response probe correction

This commit is contained in:
Svrnty 2026-05-31 22:37:17 -04:00
parent fca300afa7
commit b4d2ca2709
3 changed files with 74 additions and 0 deletions

View File

@ -297,3 +297,43 @@ Forbidden routes:
- Do not patch `/tmp/workos-case` as the durable fix.
- Do not make Case default before Stage 2 pass evidence.
- Do not treat reasoning text as a completed `AGENT_RESULT` unless a governed adapter proves the result envelope and file diff.
## Response Probe Budget Correction - 2026-06-01
Hermes commit `bbe7c72 Use realistic Case local response probe` corrects the
first response-shape gate.
Direct Spark evidence showed:
- `qwen3.6-35b-a3b` can return reasoning only when the probe uses `max_tokens=32`.
- The same model route returns assistant content when the probe uses `max_tokens=256` or more.
- A 32-token probe can therefore create a false response-shape blocker.
Harness effect:
- The Case local response-shape probe now uses `max_tokens=1024`.
- Focused validator added `local_provider_delayed_content_allows_case`.
- True reasoning-only responses still block before Case process start.
- Delayed-content local providers can pass the response-shape probe and reach the Case stub in validation.
Latest real evidence:
- Run artifact directory: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T023532Z-r1-string-slugify-2776187`.
- Report status: `fail`.
- Failure reason: `case agent result protocol failed`.
- Protocol marker: `backend/provider-agent-protocol.txt`.
- Case process started: `true`.
- Case model provider: `qwen-local`.
- Case model: `qwen3.6-35b-a3b`.
- Case model admission status: `admitted`.
- Changed files: none.
- Patch artifact: `patch.diff`.
- Tests passed: `false`.
- Required events passed: `false`.
- Result: Stage 2 is still blocked.
Current interpretation:
- `CTO-WORK-031` remains useful as a guardrail for true reasoning-only local provider responses.
- `CTO-WORK-031` is not the current primary blocker for the Spark Qwen route.
- The active blocker returns to `CTO-WORK-028`: Case reaches execution but does not produce the required `AGENT_RESULT` envelope or workspace diff.

View File

@ -140,6 +140,23 @@ Validation Evidence:
- `CTO-WORK-016` remains blocked because no real Case Stage 2 pass report exists.
- Current downstream blocker is `CTO-WORK-031`.
## Response Probe Budget Correction Evidence - 2026-06-01
- Hermes commit: `bbe7c72 Use realistic Case local response probe`.
- The local provider response-shape probe now allows delayed assistant content before classifying a provider as reasoning-only.
- Focused validator passed with `local_provider_delayed_content_allows_case`.
- Aggregate Hermes health passed after merge.
- Real Case Stage 2 retry with admitted `qwen-local` / `qwen3.6-35b-a3b` produced report `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T023532Z-r1-string-slugify-2776187/report.json`.
- Case process started after admission and response-shape probe passed.
- Backend exit code was `1`.
- Failure reason was `case agent result protocol failed`.
- Protocol marker was recorded at `backend/provider-agent-protocol.txt`.
- The harness recorded no changed files.
- The patch artifact was empty.
- Tests failed because the artificial fixture bug remained unchanged.
- `CTO-WORK-016` remains blocked because no real Case Stage 2 pass report exists.
- Current downstream blocker returns to `CTO-WORK-028`.
## Isolated Pi Config Runtime Evidence - 2026-06-01
- Hermes commit: `09b5851 Isolate Case Pi provider config`.

View File

@ -87,6 +87,23 @@ Current evidence:
- `CTO-WORK-030` remains blocked until a configured endpoint can support a real Stage 2 pass.
- Current downstream blocker is `CTO-WORK-031`.
## Spark Response Probe Budget Correction - 2026-06-01
- Hermes commit: `bbe7c72 Use realistic Case local response probe`.
- Direct Spark probe showed `qwen3.6-35b-a3b` returns reasoning only at `max_tokens=32`.
- Direct Spark probe showed the same route returns assistant content at `max_tokens=256` and `max_tokens=1024`.
- The Hermes response-shape probe now uses `max_tokens=1024`.
- Focused validator added `local_provider_delayed_content_allows_case`.
- Runtime endpoint value was supplied only through environment and is not recorded in SOT.
- Real Case Qwen loop artifact after the correction: `/home/svrnty/.hermes/profiles/cto-planb/harness-runs/20260601T023532Z-r1-string-slugify-2776187`.
- Report status: `fail`.
- Failure reason: `case agent result protocol failed`.
- Case process started: `true`.
- Case model provider: `qwen-local`.
- Case model: `qwen3.6-35b-a3b`.
- Case model admission status: `admitted`.
- Result: Spark endpoint availability is no longer the current unknown; Stage 2 remains blocked by the Case agent-result protocol seam.
## Hermes Case Qwen Loop Evidence - 2026-06-01
- Hermes commit: `6c453ee Add Case Qwen loop entrypoint`.