Skip to content

Instantly share code, notes, and snippets.

@alileza
Created March 31, 2026 11:44
Show Gist options
  • Select an option

  • Save alileza/0057c03c0d2b2ce609b65587f8b81003 to your computer and use it in GitHub Desktop.

Select an option

Save alileza/0057c03c0d2b2ce609b65587f8b81003 to your computer and use it in GitHub Desktop.

Claude Code Internals: How Context Assembly & Processing Works

Analysis of the extracted source code of Claude Code CLI v2.1.88 (@anthropic-ai/claude-code)


Architecture Overview

Claude Code is a sophisticated agentic loop wrapped in a terminal UI (~1,884 TypeScript files). The core data flow:

User Input → CLI Parser → Query Loop → Anthropic API → Tool Execution → Terminal UI

Key Subsystems

Subsystem Location Purpose
Entry & Init src/main.tsx Commander CLI setup, bootstrap sequence
Query Engine src/query.ts, src/QueryEngine.ts Async generator driving the conversation loop
Tool System src/Tool.ts, src/tools/ ~45 tools (Bash, File ops, Search, Agent, MCP, etc.)
Bridge src/bridge/ Remote mode + claude.ai web sync
Commands src/commands/ Slash commands (/commit, /config, etc.)
UI src/ink/, src/screens/ Ink (React for terminal) rendering
Skills src/skills/ Loadable markdown/shell scripts from .claude/skills/
MCP src/services/mcp/ Model Context Protocol server connections
State src/state/ Zustand store for settings, permissions, tasks, UI
API Client src/services/api/claude.ts Anthropic SDK wrapper with streaming, retries, caching

Pre-Processing Pipeline (User → Model)

Everything that happens between the user typing a prompt and the API call being made.

1. Input Parsing (processUserInput.ts)

  • Slash commands detected (anything starting with /) and routed separately
  • Images resized/downsampled to fit API limits
  • Pasted content extracted, stored to disk, metadata collected

2. User Prompt Submit Hooks

  • Shell hooks configured in settings.json execute before the prompt is sent
  • Hooks can block the prompt, prevent continuation, or inject additional context
  • Hook output capped at 10k chars

3. Attachment Extraction

  • IDE selections, file references, agent mentions scanned from prompt text
  • Creates extra AttachmentMessage objects (memory files, git diffs, etc.) injected alongside the user message

4. System Prompt Assembly (systemPrompt.ts)

Priority chain (first match wins):

  1. Override system prompt (e.g., loop mode)
  2. Coordinator system prompt (multi-agent)
  3. Agent system prompt (if running as subagent)
  4. Custom --system-prompt flag
  5. Default system prompt (the standard big prompt)
  6. appendSystemPrompt always tacked on at end

Then appended to the system prompt:

  • Git status snapshot (branch, status, recent commits)
  • Attribution fingerprint header
  • Advisor instructions (if enabled)

5. User Context Injection (context.ts)

A <system-reminder> meta message prepended to the conversation containing:

  • CLAUDE.md files (auto-discovered from directory tree + ~/.claude/)
  • Current date
  • Marked isMeta so it's invisible in the REPL but sent to the model

6. Message Normalization (messages.ts::normalizeMessagesForAPI())

  • Virtual messages stripped (display-only, never sent)
  • Consecutive user messages merged (Bedrock compatibility)
  • Attachment messages bubbled up to correct position
  • Error-triggered media blocks stripped (if a PDF/image was too large last turn, the block is removed)
  • Tool reference blocks get a sibling "Tool loaded." text block injected
  • System/progress messages filtered out

7. Tool Definition Serialization (api.ts::toolToAPISchema())

  • Each tool's Zod schema → JSON Schema
  • Deferred tools marked with defer_loading: true (model sees the name but can't call until fetched)
  • Cache control markers added (ephemeral with scope global/org)
  • Schemas cached per session to prevent mid-conversation flips

8. Prompt Caching (claude.ts::buildSystemPromptBlocks())

  • System prompt split into prefix (globally cacheable) and rest (org-scoped)
  • Cache control with TTL (1h for eligible users)
  • If MCP tools present, global caching disabled (dynamic tool definitions)

9. Final API Request

  • Media item count capped at 100 (oldest stripped)
  • Every tool_use block verified to have a matching tool_result (synthetic placeholders inserted if missing)
  • Thinking blocks stripped if model changed mid-conversation
  • Headers: API key, client ID, request attribution fingerprint

Post-Processing Pipeline (Model → Next Iteration)

Everything that happens after Claude's API response comes back.

1. Streaming & Backfill

  • Tool inputs backfilled during streaming (e.g., FileReadTool expands relative paths to absolute)
  • On model fallback, tombstone messages replace orphaned assistant messages

2. Error Withholding

  • 413 (prompt too long), max output tokens, media too large errors are withheld from the user
  • Recovery is attempted first — only shown if recovery fails

3. Streaming Tool Execution

  • Tools execute concurrently during model streaming (not after)
  • Results yielded to UI as they complete, before the turn ends

4. Post-Sampling Hooks

  • Internal hook registry runs after streaming completes
  • Fire-and-forget (async, errors logged but don't block)
  • Used for analytics/instrumentation

5. Stop Hooks (stopHooks.ts)

  • All configured stop hooks execute after the model says "I'm done"
  • Can produce blocking errors → triggers a retry with the error as new context
  • Can set preventContinuation → forces loop exit

6. Error Recovery Chain

If no tool calls and an error was withheld:

Recovery What it does
Context-collapse drain Removes staged collapses to free tokens
Reactive compact Full conversation summary on 413/media errors
Max output tokens escalation Retry at 64k tokens, then inject "resume" meta message
Token budget check If budget exhausted → exit; else inject nudge with remaining %

Each recovery continues the loop with new state rather than erroring out.

7. Tool Result Formatting

  • Results become ToolResultBlockParam with is_error flag
  • Content replacement applied if result exceeds maxResultSizeChars
  • Images handled separately from text in results

8. Auto-Compaction (compact/)

Before the next API call:

  • Tool result budget enforced — oversized results replaced with placeholders
  • Microcompaction (feature-gated) — clears old cached tool results from previous turns
  • Autocompaction — if token count exceeds threshold, a separate Claude call summarizes the conversation

9. Loop Decision

Has tool_use blocks? → Execute tools → Continue loop
Stop hooks blocked?  → Inject error  → Continue loop
No more work?        → Return terminal state → Exit

Context Assembly: The Three-Layer Model

Context is built in three separate layers, each injected differently into the API request.

Visual Layout

┌────────────────────────────────────────────────────────┐
│  SYSTEM PROMPT  (the "system" field in API request)    │
│  ├── Attribution header (billing fingerprint)          │
│  ├── CLI prefix ("You are Claude Code...")             │
│  ├── Default prompt (tools, behavior, instructions)    │
│  ├── MCP server instructions                           │
│  ├── Memory mechanics prompt                           │
│  └── System Context (appended):                        │
│      ├── gitStatus: branch, status, recent commits     │
│      └── cacheBreaker (debug, if enabled)              │
├────────────────────────────────────────────────────────┤
│  SYNTHETIC USER MESSAGE  (prepended to messages[0])    │
│  <system-reminder>                                     │
│  # claudeMd                                            │
│  [Contents of all CLAUDE.md files]                     │
│  # currentDate                                         │
│  Today's date is 2026-03-31.                           │
│  IMPORTANT: this context may or may not be relevant... │
│  </system-reminder>                                    │
├────────────────────────────────────────────────────────┤
│  ACTUAL CONVERSATION  (the real messages)              │
│  user → assistant → tool_use → tool_result → ...       │
└────────────────────────────────────────────────────────┘

Layer 1: System Context → appended to system prompt

From src/context.ts:

export const getSystemContext = memoize(async () => {
  const gitStatus = isRemote() || !shouldIncludeGitInstructions()
    ? null
    : await getGitStatus()   // branch, status (2k cap), last 5 commits

  return {
    ...(gitStatus && { gitStatus }),
  }
})

Then in src/utils/api.ts, it's flattened and appended:

function appendSystemContext(systemPrompt, context) {
  return [
    ...systemPrompt,
    Object.entries(context)
      .map(([key, value]) => `${key}: ${value}`)
      .join('\n'),
  ]
}

Git context becomes plain text at the tail end of the system prompt.

Layer 2: User Context → synthetic first message

From src/context.ts:

export const getUserContext = memoize(async () => {
  const claudeMd = shouldDisableClaudeMd
    ? null
    : getClaudeMds(filterInjectedMemoryFiles(await getMemoryFiles()))

  return {
    ...(claudeMd && { claudeMd }),
    currentDate: `Today's date is ${getLocalISODate()}.`,
  }
})

Then in src/utils/api.ts, it becomes a fake user message prepended before the real conversation:

function prependUserContext(messages, context) {
  return [
    createUserMessage({
      content: `<system-reminder>
As you answer the user's questions, you can use the following context:
${Object.entries(context)
  .map(([key, value]) => `# ${key}\n${value}`)
  .join('\n')}

IMPORTANT: this context may or may not be relevant to your tasks.
</system-reminder>`,
      isMeta: true,  // invisible in REPL, but sent to model
    }),
    ...messages,
  ]
}

Layer 3: System Prompt Priority Chain

From src/utils/systemPrompt.ts:

1. Override prompt (loop mode)         → REPLACES everything
2. Coordinator prompt (multi-agent)    → REPLACES default
3. Agent prompt (subagent definition)  → REPLACES or appends to default
4. Custom --system-prompt flag         → REPLACES default
5. Default prompt                      → The standard big prompt
   +
   appendSystemPrompt                  → Always added at end

CLAUDE.md Discovery Order

Files are discovered bottom-up from src/utils/claudemd.ts:

Priority Source Path
1 (lowest) Managed /etc/claude-code/CLAUDE.md
2 User ~/.claude/CLAUDE.md
3 Project CLAUDE.md, .claude/CLAUDE.md, .claude/rules/*.md
4 (highest) Local CLAUDE.local.md

Files closer to cwd load last → model pays more attention (recency bias in context window). Referenced files (via @path syntax) are resolved and inlined.

Final Assembly in services/api/claude.ts

// 1. Prepend attribution + CLI prefix to system prompt
systemPrompt = [
  getAttributionHeader(fingerprint),
  getCLISyspromptPrefix(),
  ...systemPrompt,           // already has systemContext appended
  ...advisorBlocks,          // if advisor enabled
]

// 2. Split for prompt caching
system = buildSystemPromptBlocks(systemPrompt, enableCaching)
//  → Block 1: prefix (global cache scope)
//  → Block 2: rest (org cache scope)

// 3. Messages get user context prepended + cache breakpoints
messages = addCacheBreakpoints(
  prependUserContext(messagesForQuery, userContext)
)

// 4. Final request
{ model, system, messages, tools, max_tokens, thinking, betas, ... }

Caching Strategy

  • getSystemContext and getUserContext are memoized — computed once per session
  • CLAUDE.md changes mid-session won't be picked up (until /clear)
  • Git status is a snapshot from session start
  • System prompt blocks get prompt caching headers (reused across API requests)

The <system-reminder> Pattern

This tag wraps all system-injected context: user context, tool results, file warnings, memory notes. It tells the model "this is system-generated context, not user input." The model is trained to weight these as optional background information.


What Actually Improves Results (Beyond Raw Model)

Things that shape output quality

  1. Tool deferred loading — Not all ~45 tools shown at once. Reduces tool confusion, improves selection accuracy.

  2. Stop hooks as quality gates — External validators (linters, tests) can reject output and force retries with error context.

  3. Compaction by separate model call — Long conversations get summarized by a separate Claude call, preserving semantic quality instead of naive truncation.

  4. Max output tokens recovery — Injects a guided "Resume directly..." meta message for coherent continuation.

  5. Tool result budgeting — Oversized results replaced with placeholders, keeping the model focused.

  6. Microcompaction — Old tool results from previous turns get content cleared between iterations.

  7. Backfill mechanism — Tools retroactively enrich their own inputs (e.g., expand relative paths), so future turns have better context.

What's NOT there

  • No best-of-N sampling or reranking
  • No output quality classifier
  • No self-reflection loop
  • No post-editing of model output
  • No client-side chain-of-thought injection

The real leverage is context assembly — putting the model in the best position to succeed on the first try, then handling recovery when it doesn't.


Query Loop State Machine

while (true) {
  1. SETUP
     - Destructure state, start memory prefetch

  2. API CALL & STREAM
     - Call model (with fallback logic)
     - Backfill tool_use inputs
     - Withhold errors if recoverable
     - Yield streamed messages + concurrent tool execution

  3. POST-SAMPLING
     - Execute post-sampling hooks (fire-and-forget)

  4. STOP CONDITIONS
     - Check abort
     - Try context-collapse drain
     - Try reactive compact
     - Try max output tokens recovery
     - Execute stop hooks
     - Check token budget

  5. TOOL EXECUTION (if needed)
     - Partition into batches (read-only concurrent, write serial)
     - Execute, normalize results, accumulate

  6. PREPARE NEXT ITERATION
     - Normalize messages for API
     - Apply tool result budget
     - Apply microcompaction
     - Check autocompaction threshold
     - Update state → continue
}

Source: Extracted from @anthropic-ai/claude-code v2.1.88 npm package source map (cli.js.map)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment