An LLM in code is just a function call over HTTP. There's no magic "spinning up" — you either hit a local daemon or a cloud endpoint, send {model, messages, options}, and get text back. For this spike I'd use Ollama (local): it's the on-prem, llama.cpp-family runtime we identified as the production sharp tool, so what you learn transfers directly; no API key; and it has a JSON mode. If ollama serve is running, you POST http://localhost:11434/api/chat with {"model": "llama3.1" (or any instruct model you've pulled), "messages": [...], "format": "json", "stream": false, "options": {"temperature": 0}}, and the JSON you want is in response ["message"]["content"] (as a string you still have to json.loads). urllib from the stdlib is enough — no new deps.
Three concepts that are the whole skill, each a deliberate choice:
- The system prompt is the component's contract, not chit-chat. You're not conversing; you're defining a function in English. It must state the job ("identify spans likely to be ASR mis-hearings — proper nouns, organizations, technical terms"), hard-forbid the thing your values forbid ("never propose the correct word, never rewrite — only point"), and specify the exact output schema (e.g.
{"flags": [{"span": "...", "reason": "name|org|term|garbled"}]}, spans copied verbatim from the text). - Structured output is the crux — the single skill that separates "LLM as pipeline stage" from "LLM as chatbot." You demand JSON (the prompt + Ollama's format: "json") and then you parse and validate it into Flags, because the model will occasionally wrap it, add a stray key, or hand you an empty list. Treat the model's output like untrusted input crossing a typed boundary —
json.loads, then build Flags defensively. This is the same instinct as your RustResulthandling: never trust the edge, impose the type. - Temperature 0. A tool wants determinism; creativity is the enemy here. (Callback to our whisper beam/temperature thread — same knob, opposite goal: there we wanted fallback randomness to escape loops; here we want a flat, repeatable verdict.)