Skip to content

Instantly share code, notes, and snippets.

@phyllisstein
Created June 24, 2026 14:58
Show Gist options
  • Select an option

  • Save phyllisstein/a3ae7530952e7a46a909383a0d2654b2 to your computer and use it in GitHub Desktop.

Select an option

Save phyllisstein/a3ae7530952e7a46a909383a0d2654b2 to your computer and use it in GitHub Desktop.
Claude explains LLM "contracts"

An LLM in code is just a function call over HTTP. There's no magic "spinning up" — you either hit a local daemon or a cloud endpoint, send {model, messages, options}, and get text back. For this spike I'd use Ollama (local): it's the on-prem, llama.cpp-family runtime we identified as the production sharp tool, so what you learn transfers directly; no API key; and it has a JSON mode. If ollama serve is running, you POST http://localhost:11434/api/chat with {"model": "llama3.1" (or any instruct model you've pulled), "messages": [...], "format": "json", "stream": false, "options": {"temperature": 0}}, and the JSON you want is in response ["message"]["content"] (as a string you still have to json.loads). urllib from the stdlib is enough — no new deps.

Three concepts that are the whole skill, each a deliberate choice:

  1. The system prompt is the component's contract, not chit-chat. You're not conversing; you're defining a function in English. It must state the job ("identify spans likely to be ASR mis-hearings — proper nouns, organizations, technical terms"), hard-forbid the thing your values forbid ("never propose the correct word, never rewrite — only point"), and specify the exact output schema (e.g. {"flags": [{"span": "...", "reason": "name|org|term|garbled"}]}, spans copied verbatim from the text).
  2. Structured output is the crux — the single skill that separates "LLM as pipeline stage" from "LLM as chatbot." You demand JSON (the prompt + Ollama's format: "json") and then you parse and validate it into Flags, because the model will occasionally wrap it, add a stray key, or hand you an empty list. Treat the model's output like untrusted input crossing a typed boundary — json.loads, then build Flags defensively. This is the same instinct as your Rust Result handling: never trust the edge, impose the type.
  3. Temperature 0. A tool wants determinism; creativity is the enemy here. (Callback to our whisper beam/temperature thread — same knob, opposite goal: there we wanted fallback randomness to escape loops; here we want a flat, repeatable verdict.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment