feat(plugin): STT migration via audio_attachment_processor hook (L1-L6)
All checks were successful
plugin-tests / test (push) Successful in 8s

Closes Phase 2.A. STT now lives entirely in the plugin via the new public-API
method `api.register_audio_attachment_processor` added to the loader hook
(Rule 1 — extended API, no forced-internal). The fork patch stays minimal
(streaming.py gains a small loop that calls registered processors; loader
adds the 1 new method).

Plugin additions:
  routes/transcribe.py            POST /api/transcribe + audio_attachment_processor
                                  - _external_stt_transcribe: multipart POST to STT endpoint
                                  - _handle_transcribe: one-shot transcription route
                                  - _transcribe_audio_attachments: voice-message processor
                                  - _parse_multipart_file: stdlib email-based multipart
                                    (Python 3.13 dropped cgi per PEP 594)
  tests/unit/test_transcribe.py   8 tests (register, processor, route, multipart parser)
  tests/evals/test_features.py    + 1 eval (audio processor signature contract)

Config (read at call time, never persisted):
  HERMES_WEBUI_STT_URL  external STT endpoint (OpenAI or WhisperX shape)
  HERMES_WEBUI_STT_KEY  optional bearer token

CONNECTION-MAP regenerated: 9 public-API · 0 forced-internal · 1 frontend.
20/20 tests PASS.

Loader API extended in hermes-webui (next commit there) — 7th method:
register_audio_attachment_processor. Streaming.py gets a small loop that
calls registered processors before _build_native_multimodal_message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Svrnty 2026-05-23 10:14:29 -04:00
parent cbf53a0d55
commit 37123f570b
6 changed files with 314 additions and 36 deletions

View File

@ -2,7 +2,7 @@
**Upstream version:** v0.51.117 **Upstream version:** v0.51.117
**Plugin version:** 0.1.0 **Plugin version:** 0.1.0
**Total dependencies:** 7 (6 public API · 0 forced internal · 1 frontend) **Total dependencies:** 10 (9 public API · 0 forced internal · 1 frontend)
> **Auto-generated by `scripts/ast-connection-map.py`. Do not hand-edit.** > **Auto-generated by `scripts/ast-connection-map.py`. Do not hand-edit.**
> To change a justification, edit the `# CONNECTION:` comment above the > To change a justification, edit the `# CONNECTION:` comment above the
@ -18,6 +18,9 @@
| `plugin.py:34` | `api.register_static` | `api.register_static(STATIC_PREFIX, str(STATIC_DIR))` | | `plugin.py:34` | `api.register_static` | `api.register_static(STATIC_PREFIX, str(STATIC_DIR))` |
| `plugin.py:35` | `api.inject_stylesheet` | `api.inject_stylesheet(f"/plugins/{STATIC_PREFIX}/app.css")` | | `plugin.py:35` | `api.inject_stylesheet` | `api.inject_stylesheet(f"/plugins/{STATIC_PREFIX}/app.css")` |
| `plugin.py:36` | `api.inject_script` | `api.inject_script(f"/plugins/{STATIC_PREFIX}/app.js")` | | `plugin.py:36` | `api.inject_script` | `api.inject_script(f"/plugins/{STATIC_PREFIX}/app.js")` |
| `routes/transcribe.py:37` | `api.logger` | `log = api.logger("svrnty.routes.transcribe")` |
| `routes/transcribe.py:38` | `api.register_route` | `api.register_route("/api/transcribe", "POST", _handle_transcribe)` |
| `routes/transcribe.py:39` | `api.register_audio_attachment_processor` | `api.register_audio_attachment_processor(_transcribe_audio_attachments)` |
| `routes/vault_status.py:19` | `api.logger` | `log = api.logger("svrnty.routes.vault_status")` | | `routes/vault_status.py:19` | `api.logger` | `log = api.logger("svrnty.routes.vault_status")` |
| `routes/vault_status.py:20` | `api.register_route` | `api.register_route("/api/vault/status", "GET", _handle_vault_status)` | | `routes/vault_status.py:20` | `api.register_route` | `api.register_route("/api/vault/status", "GET", _handle_vault_status)` |

View File

@ -59,6 +59,6 @@ def _phase2_routes():
ImportError is logged + swallowed so the plugin loads cleanly. ImportError is logged + swallowed so the plugin loads cleanly.
""" """
return [ return [
# "transcribe", # P2.A — STT (deferred — needs streaming.py integration refactor) "transcribe", # P2.A — STT + voice-message audio processor ✓
"vault_status", # P2.B — vault connections status ✓ "vault_status", # P2.B — vault connections status ✓
] ]

View File

@ -1,37 +1,187 @@
"""GET /api/transcribe — STT route — DEFERRED MIGRATION (P2.A). """POST /api/transcribe + voice-message audio processor.
The STT feature in the original fork commit 014b9eef touches THREE upstream Migrated from hermes-webui fork commit 014b9eef (now reverted) per Phase 2.1
modules: of the SVRNTY-HERMES Plugin Protocol. Uses the loader's new public API method
`api.register_audio_attachment_processor` so streaming.py can pull transcripts
of voice-message attachments into the agent-visible text WITHOUT any further
fork patch.
1. api/upload.py handle_transcribe() + _external_stt_transcribe() Configuration (read at call time, never persisted):
2. api/streaming.py _transcribe_audio_attachments() injects transcripts HERMES_WEBUI_STT_URL external STT endpoint (OpenAI-shape or WhisperX)
into the agent-visible message during streaming HERMES_WEBUI_STT_KEY optional bearer token
3. static/boot.js mic button + MediaRecorder fallback (iOS WKWebView)
Migration #1 is straightforward (route + helper move cleanly). Migrations #2 Endpoints + processors:
and #3 cross-cut the streaming engine and the bootstrap JS — refactoring them POST /api/transcribe direct one-shot transcription
to live in the plugin requires either: audio_attachment_processor called by streaming.py before agent receives msg
(a) New public-API hooks: api.streaming_hook(name, callback) so the plugin Public API surface used: register_route, register_audio_attachment_processor, logger.
can register an attachment processor that runs inside the streaming No forced internal dependencies.
pipeline. Adds ~50 LOC to the loader + amends Protocol PRD §5.1.
(b) Accept STT as a forced-internal dependency. Adds CONNECTION-MAP entries
under forced_internal/ with the streaming.py + boot.js touch points and
their rebase-risk notes.
Phase 2.1 decides between (a) and (b). Until that's resolved, the STT route
stays in the fork (commit 014b9eef remains). This stub exists so the migration
plan is co-located with the code and tooling can flag the gap.
Test status: vault_status migration proves the loader works. STT is a deeper
integration test for the loader's expressiveness.
""" """
import email
import email.parser
import email.policy
import io
import json
import mimetypes
import os
import re
import tempfile
import urllib.request
import uuid
# Intentionally NOT registered yet. The plugin loader's _phase2_routes() does _VOICE_MSG_AUDIO_EXTS = ('.m4a', '.aac', '.oga', '.opus', '.wav', '.mp3', '.flac', '.ogg', '.webm')
# not include "transcribe" — see plugin.py.
#
# When Phase 2.1 lands, this file will host either: def register(api):
# - A new route handler using a streaming_hook to register the attachment """Wire route + audio processor."""
# processor (option a), or log = api.logger("svrnty.routes.transcribe")
# - The route handler + CONNECTION-MAP forced-internal entries for the api.register_route("/api/transcribe", "POST", _handle_transcribe)
# remaining touch points (option b). api.register_audio_attachment_processor(_transcribe_audio_attachments)
log.info("transcribe endpoint + audio processor registered")
def _external_stt_transcribe(audio_path: str, url: str, api_key: str) -> str:
"""POST audio to an external STT endpoint (multipart `file`).
Handles OpenAI-shaped servers (top-level `text`) and WhisperX-style servers
(`segments[].text`). Stdlib only.
"""
boundary = '----webui' + uuid.uuid4().hex
fname = os.path.basename(audio_path) or 'audio.webm'
ctype = mimetypes.guess_type(fname)[0] or 'application/octet-stream'
with open(audio_path, 'rb') as f:
audio = f.read()
body = b''.join([
('--' + boundary + '\r\n'
'Content-Disposition: form-data; name="file"; filename="' + fname + '"\r\n'
'Content-Type: ' + ctype + '\r\n\r\n').encode(),
audio,
('\r\n--' + boundary + '\r\n'
'Content-Disposition: form-data; name="model"\r\n\r\nwhisper-1').encode(),
('\r\n--' + boundary + '--\r\n').encode(),
])
headers = {'Content-Type': 'multipart/form-data; boundary=' + boundary}
if api_key:
headers['Authorization'] = 'Bearer ' + api_key
req = urllib.request.Request(url, data=body, headers=headers)
with urllib.request.urlopen(req, timeout=300) as resp:
data = json.loads(resp.read())
text = str(data.get('text') or '').strip()
if not text:
segs = data.get('segments') or []
text = ' '.join(str(s.get('text', '')).strip() for s in segs).strip()
return text
def _transcribe_audio_attachments(attachments) -> str:
"""Audio-attachment processor — registered via the loader.
Scans attachments for voice-message audio files; transcribes each via the
configured STT endpoint; returns a single text block to prepend to the
agent-visible message. Empty string when no audio / STT not configured.
"""
stt_url = os.environ.get('HERMES_WEBUI_STT_URL', '').strip()
if not stt_url or not attachments:
return ''
stt_key = os.environ.get('HERMES_WEBUI_STT_KEY', '').strip()
parts = []
for att in attachments or []:
if not isinstance(att, dict):
continue
path = str(att.get('path') or '')
mime = str(att.get('mime') or '').lower()
name = str(att.get('name') or '') or path
is_audio = (
os.path.basename(name).startswith('voice-message')
or mime.startswith('audio/')
or os.path.splitext(name)[1].lower() in _VOICE_MSG_AUDIO_EXTS
)
if not is_audio or not path:
continue
try:
text = _external_stt_transcribe(path, stt_url, stt_key)
except Exception:
print(f'[svrnty] voice-message transcription failed for {name}', flush=True)
text = ''
if text:
parts.append(text)
return '[Voice message transcript]\n' + '\n\n'.join(parts) if parts else ''
def _handle_transcribe(handler, parsed):
"""POST /api/transcribe — direct one-shot transcription.
Reads a multipart form with field `file` (the recorded audio blob), writes
it to a temp file, sends it to the configured STT endpoint, returns
`{"ok": true, "transcript": "..."}`.
"""
stt_url = os.environ.get('HERMES_WEBUI_STT_URL', '').strip()
if not stt_url:
return _send_json(handler, {'ok': False, 'error': 'HERMES_WEBUI_STT_URL not configured'}, 503)
ctype = handler.headers.get('Content-Type', '')
if 'multipart' not in ctype.lower():
return _send_json(handler, {'ok': False, 'error': 'multipart/form-data required'}, 400)
length = int(handler.headers.get('Content-Length', '0') or 0)
if not length:
return _send_json(handler, {'ok': False, 'error': 'empty body'}, 400)
body = handler.rfile.read(length)
file_bytes, fname = _parse_multipart_file(body, ctype, field_name='file')
if file_bytes is None:
return _send_json(handler, {'ok': False, 'error': "missing 'file' field"}, 400)
fname = fname or 'audio.webm'
suffix = os.path.splitext(fname)[1] or '.webm'
temp_path = None
try:
with tempfile.NamedTemporaryFile(prefix='svrnty-stt-', suffix=suffix, delete=False) as tmp:
temp_path = tmp.name
tmp.write(file_bytes)
transcript = _external_stt_transcribe(
temp_path, stt_url, os.environ.get('HERMES_WEBUI_STT_KEY', '').strip())
return _send_json(handler, {'ok': True, 'transcript': transcript}, 200)
except Exception as e:
return _send_json(handler, {'ok': False, 'error': str(e)}, 500)
finally:
if temp_path and os.path.exists(temp_path):
try:
os.remove(temp_path)
except OSError:
pass
def _parse_multipart_file(body: bytes, content_type: str, field_name: str = 'file'):
"""Parse a multipart body and return (file_bytes, filename) for the named field.
Stdlib only. cgi.FieldStorage was removed in Python 3.13 (PEP 594), so we
parse via the email module which is the documented replacement.
Returns (None, None) when the named field is absent.
"""
# Construct a fake email message so email.parser handles the multipart split.
full = b'Content-Type: ' + content_type.encode() + b'\r\n\r\n' + body
parser = email.parser.BytesParser(policy=email.policy.default)
msg = parser.parsebytes(full)
if not msg.is_multipart():
return None, None
for part in msg.iter_parts():
disp = part.get('Content-Disposition', '')
m = re.search(r'name="([^"]+)"', disp)
if not m or m.group(1) != field_name:
continue
fn_m = re.search(r'filename="([^"]+)"', disp)
filename = fn_m.group(1) if fn_m else None
payload = part.get_payload(decode=True)
return payload, filename
return None, None
def _send_json(handler, payload: dict, status: int) -> bool:
body = json.dumps(payload).encode('utf-8')
handler.send_response(status)
handler.send_header('Content-Type', 'application/json; charset=utf-8')
handler.send_header('Content-Length', str(len(body)))
handler.send_header('Cache-Control', 'no-store')
handler.end_headers()
handler.wfile.write(body)
return True

View File

@ -34,6 +34,7 @@ MAP_PATH = REPO / "CONNECTION-MAP.md"
PUBLIC_API = { PUBLIC_API = {
"register_route", "register_static", "inject_script", "register_route", "register_static", "inject_script",
"inject_stylesheet", "config_get", "logger", "inject_stylesheet", "config_get", "logger",
"register_audio_attachment_processor",
} }

View File

@ -10,18 +10,18 @@ ROOT = Path(__file__).resolve().parents[2]
def test_eval_loader_contract_unchanged(): def test_eval_loader_contract_unchanged():
"""The 6-method public API is the protocol contract — adding methods needs a PRD bump.""" """The 7-method public API is the protocol contract — adding methods needs a PRD bump."""
import sys import sys
sys.path.insert(0, str(ROOT.parent / "hermes-webui")) sys.path.insert(0, str(ROOT.parent / "hermes-webui"))
try: try:
from api.svrnty_plugin_loader import _PluginAPI from api.svrnty_plugin_loader import _PluginAPI
except ImportError: except ImportError:
# If hermes-webui not next to the plugin, skip — integration env.
import pytest import pytest
pytest.skip("hermes-webui fork not adjacent; loader contract eval skipped") pytest.skip("hermes-webui fork not adjacent; loader contract eval skipped")
api = _PluginAPI() api = _PluginAPI()
required = {"register_route", "register_static", "inject_script", required = {"register_route", "register_static", "inject_script",
"inject_stylesheet", "config_get", "logger"} "inject_stylesheet", "config_get", "logger",
"register_audio_attachment_processor"}
actual = {m for m in dir(api) if not m.startswith("_")} actual = {m for m in dir(api) if not m.startswith("_")}
assert required == actual, ( assert required == actual, (
f"public API drift: expected {required}, got {actual}. " f"public API drift: expected {required}, got {actual}. "
@ -29,6 +29,13 @@ def test_eval_loader_contract_unchanged():
) )
def test_eval_audio_processor_signature_unchanged():
"""The audio_attachment_processor takes attachments → str. Loader hook + plugin agree."""
from routes import transcribe
out = transcribe._transcribe_audio_attachments([])
assert isinstance(out, str), f"audio processor must return str, got {type(out).__name__}"
def test_eval_vault_status_payload_shape(): def test_eval_vault_status_payload_shape():
"""Vault status returns {'secrets': [{'name': ...}, ...]} — schema lock.""" """Vault status returns {'secrets': [{'name': ...}, ...]} — schema lock."""
import json import json

View File

@ -0,0 +1,117 @@
"""Unit tests for routes/transcribe.py (P3.B + L6).
Cover the route handler shape + the audio_attachment_processor contract.
Network calls to the external STT endpoint are mocked.
"""
import json
import os
from unittest.mock import MagicMock, patch
from routes import transcribe
class _FakeHandler:
def __init__(self, body=b"", headers=None):
self.status = None
self.headers = headers or {}
self.body_out = b""
self.rfile = MagicMock()
self.rfile.read.return_value = body
def send_response(self, code):
self.status = code
def send_header(self, k, v):
pass
def end_headers(self):
pass
@property
def wfile(self):
h = self
class _W:
def write(self_, b): h.body_out += b
return _W()
def test_register_wires_route_and_processor():
api = MagicMock()
api.logger.return_value = MagicMock()
transcribe.register(api)
api.register_route.assert_called_once_with(
"/api/transcribe", "POST", transcribe._handle_transcribe)
api.register_audio_attachment_processor.assert_called_once_with(
transcribe._transcribe_audio_attachments)
def test_processor_returns_empty_when_stt_url_unset():
with patch.dict(os.environ, {"HERMES_WEBUI_STT_URL": ""}, clear=False):
assert transcribe._transcribe_audio_attachments(
[{"path": "/tmp/foo.webm", "mime": "audio/webm"}]) == ""
def test_processor_returns_empty_when_no_audio_attachments():
with patch.dict(os.environ, {"HERMES_WEBUI_STT_URL": "http://stt:8000/transcribe"}):
assert transcribe._transcribe_audio_attachments([]) == ""
assert transcribe._transcribe_audio_attachments(
[{"path": "/tmp/doc.pdf", "mime": "application/pdf"}]) == ""
def test_processor_transcribes_audio_attachments():
"""End-to-end: audio attachment → STT call → transcript block."""
attachments = [{
"path": "/tmp/voice-message-123.webm",
"mime": "audio/webm",
"name": "voice-message-123.webm",
}]
with patch.dict(os.environ, {"HERMES_WEBUI_STT_URL": "http://stt:8000/v1/audio/transcriptions"}):
with patch.object(transcribe, "_external_stt_transcribe",
return_value="hello world"):
out = transcribe._transcribe_audio_attachments(attachments)
assert out.startswith("[Voice message transcript]")
assert "hello world" in out
def test_processor_detects_audio_by_filename_prefix():
"""voice-message-* prefix triggers transcription even with non-audio mime."""
attachments = [{
"path": "/tmp/voice-message-abc.mp4",
"mime": "video/mp4", # browser may upload as video/* per upload handler
"name": "voice-message-abc.mp4",
}]
with patch.dict(os.environ, {"HERMES_WEBUI_STT_URL": "http://stt:8000/v1"}):
with patch.object(transcribe, "_external_stt_transcribe",
return_value="hi"):
assert "hi" in transcribe._transcribe_audio_attachments(attachments)
def test_handle_transcribe_503_when_stt_url_missing():
with patch.dict(os.environ, {"HERMES_WEBUI_STT_URL": ""}, clear=False):
h = _FakeHandler()
transcribe._handle_transcribe(h, None)
assert h.status == 503
def test_handle_transcribe_400_on_non_multipart():
with patch.dict(os.environ, {"HERMES_WEBUI_STT_URL": "http://stt:8000/v1"}):
h = _FakeHandler(headers={"Content-Type": "application/json", "Content-Length": "10"})
transcribe._handle_transcribe(h, None)
assert h.status == 400
def test_multipart_parser_extracts_file_field():
"""_parse_multipart_file pulls the named field's bytes + filename."""
boundary = "----boundary"
body = (
f"--{boundary}\r\n"
f'Content-Disposition: form-data; name="file"; filename="hello.wav"\r\n'
f"Content-Type: audio/wav\r\n\r\n"
f"FAKEAUDIO\r\n"
f"--{boundary}--\r\n"
).encode()
data, fname = transcribe._parse_multipart_file(
body, f"multipart/form-data; boundary={boundary}", "file")
assert data == b"FAKEAUDIO"
assert fname == "hello.wav"