Claude Code Internals: How Context Assembly & Processing Works

Analysis of the extracted source code of Claude Code CLI v2.1.88 (@anthropic-ai/claude-code)

Architecture Overview

Claude Code is a sophisticated agentic loop wrapped in a terminal UI (~1,884 TypeScript files). The core data flow:

User Input → CLI Parser → Query Loop → Anthropic API → Tool Execution → Terminal UI

Key Subsystems

Subsystem	Location	Purpose
Entry & Init	`src/main.tsx`	Commander CLI setup, bootstrap sequence
Query Engine	`src/query.ts`, `src/QueryEngine.ts`	Async generator driving the conversation loop
Tool System	`src/Tool.ts`, `src/tools/`	~45 tools (Bash, File ops, Search, Agent, MCP, etc.)
Bridge	`src/bridge/`	Remote mode + claude.ai web sync
Commands	`src/commands/`	Slash commands (`/commit`, `/config`, etc.)
UI	`src/ink/`, `src/screens/`	Ink (React for terminal) rendering
Skills	`src/skills/`	Loadable markdown/shell scripts from `.claude/skills/`
MCP	`src/services/mcp/`	Model Context Protocol server connections
State	`src/state/`	Zustand store for settings, permissions, tasks, UI
API Client	`src/services/api/claude.ts`	Anthropic SDK wrapper with streaming, retries, caching

Pre-Processing Pipeline (User → Model)

Everything that happens between the user typing a prompt and the API call being made.

1. Input Parsing (`processUserInput.ts`)

Slash commands detected (anything starting with /) and routed separately
Images resized/downsampled to fit API limits
Pasted content extracted, stored to disk, metadata collected

2. User Prompt Submit Hooks

Shell hooks configured in settings.json execute before the prompt is sent
Hooks can block the prompt, prevent continuation, or inject additional context
Hook output capped at 10k chars

3. Attachment Extraction

IDE selections, file references, agent mentions scanned from prompt text
Creates extra AttachmentMessage objects (memory files, git diffs, etc.) injected alongside the user message

4. System Prompt Assembly (`systemPrompt.ts`)

Priority chain (first match wins):

Override system prompt (e.g., loop mode)
Coordinator system prompt (multi-agent)
Agent system prompt (if running as subagent)
Custom --system-prompt flag
Default system prompt (the standard big prompt)
appendSystemPrompt always tacked on at end

Then appended to the system prompt:

Git status snapshot (branch, status, recent commits)
Attribution fingerprint header
Advisor instructions (if enabled)

5. User Context Injection (`context.ts`)

A <system-reminder> meta message prepended to the conversation containing:

CLAUDE.md files (auto-discovered from directory tree + ~/.claude/)
Current date
Marked isMeta so it's invisible in the REPL but sent to the model

6. Message Normalization (`messages.ts::normalizeMessagesForAPI()`)

Virtual messages stripped (display-only, never sent)
Consecutive user messages merged (Bedrock compatibility)
Attachment messages bubbled up to correct position
Error-triggered media blocks stripped (if a PDF/image was too large last turn, the block is removed)
Tool reference blocks get a sibling "Tool loaded." text block injected
System/progress messages filtered out

7. Tool Definition Serialization (`api.ts::toolToAPISchema()`)

Each tool's Zod schema → JSON Schema
Deferred tools marked with defer_loading: true (model sees the name but can't call until fetched)
Cache control markers added (ephemeral with scope global/org)
Schemas cached per session to prevent mid-conversation flips

8. Prompt Caching (`claude.ts::buildSystemPromptBlocks()`)

System prompt split into prefix (globally cacheable) and rest (org-scoped)
Cache control with TTL (1h for eligible users)
If MCP tools present, global caching disabled (dynamic tool definitions)

9. Final API Request

Media item count capped at 100 (oldest stripped)
Every tool_use block verified to have a matching tool_result (synthetic placeholders inserted if missing)
Thinking blocks stripped if model changed mid-conversation
Headers: API key, client ID, request attribution fingerprint

Post-Processing Pipeline (Model → Next Iteration)

Everything that happens after Claude's API response comes back.

1. Streaming & Backfill

Tool inputs backfilled during streaming (e.g., FileReadTool expands relative paths to absolute)
On model fallback, tombstone messages replace orphaned assistant messages

2. Error Withholding

413 (prompt too long), max output tokens, media too large errors are withheld from the user
Recovery is attempted first — only shown if recovery fails

3. Streaming Tool Execution

Tools execute concurrently during model streaming (not after)
Results yielded to UI as they complete, before the turn ends

4. Post-Sampling Hooks

Internal hook registry runs after streaming completes
Fire-and-forget (async, errors logged but don't block)
Used for analytics/instrumentation

5. Stop Hooks (`stopHooks.ts`)

All configured stop hooks execute after the model says "I'm done"
Can produce blocking errors → triggers a retry with the error as new context
Can set preventContinuation → forces loop exit

6. Error Recovery Chain

If no tool calls and an error was withheld:

Recovery	What it does
Context-collapse drain	Removes staged collapses to free tokens
Reactive compact	Full conversation summary on 413/media errors
Max output tokens escalation	Retry at 64k tokens, then inject "resume" meta message
Token budget check	If budget exhausted → exit; else inject nudge with remaining %

Each recovery continues the loop with new state rather than erroring out.

7. Tool Result Formatting

Results become ToolResultBlockParam with is_error flag
Content replacement applied if result exceeds maxResultSizeChars
Images handled separately from text in results

8. Auto-Compaction (`compact/`)

Before the next API call:

Tool result budget enforced — oversized results replaced with placeholders
Microcompaction (feature-gated) — clears old cached tool results from previous turns
Autocompaction — if token count exceeds threshold, a separate Claude call summarizes the conversation

9. Loop Decision

Has tool_use blocks? → Execute tools → Continue loop
Stop hooks blocked?  → Inject error  → Continue loop
No more work?        → Return terminal state → Exit

Context Assembly: The Three-Layer Model

Context is built in three separate layers, each injected differently into the API request.

Visual Layout

┌────────────────────────────────────────────────────────┐
│  SYSTEM PROMPT  (the "system" field in API request)    │
│  ├── Attribution header (billing fingerprint)          │
│  ├── CLI prefix ("You are Claude Code...")             │
│  ├── Default prompt (tools, behavior, instructions)    │
│  ├── MCP server instructions                           │
│  ├── Memory mechanics prompt                           │
│  └── System Context (appended):                        │
│      ├── gitStatus: branch, status, recent commits     │
│      └── cacheBreaker (debug, if enabled)              │
├────────────────────────────────────────────────────────┤
│  SYNTHETIC USER MESSAGE  (prepended to messages[0])    │
│  <system-reminder>                                     │
│  # claudeMd                                            │
│  [Contents of all CLAUDE.md files]                     │
│  # currentDate                                         │
│  Today's date is 2026-03-31.                           │
│  IMPORTANT: this context may or may not be relevant... │
│  </system-reminder>                                    │
├────────────────────────────────────────────────────────┤
│  ACTUAL CONVERSATION  (the real messages)              │
│  user → assistant → tool_use → tool_result → ...       │
└────────────────────────────────────────────────────────┘

Layer 1: System Context → appended to system prompt

From src/context.ts:

export const getSystemContext = memoize(async () => {
  const gitStatus = isRemote() || !shouldIncludeGitInstructions()
    ? null
    : await getGitStatus()   // branch, status (2k cap), last 5 commits

  return {
    ...(gitStatus && { gitStatus }),
  }
})

Then in src/utils/api.ts, it's flattened and appended:

function appendSystemContext(systemPrompt, context) {
  return [
    ...systemPrompt,
    Object.entries(context)
      .map(([key, value]) => `${key}: ${value}`)
      .join('\n'),
  ]
}

Git context becomes plain text at the tail end of the system prompt.

Layer 2: User Context → synthetic first message

From src/context.ts:

export const getUserContext = memoize(async () => {
  const claudeMd = shouldDisableClaudeMd
    ? null
    : getClaudeMds(filterInjectedMemoryFiles(await getMemoryFiles()))

  return {
    ...(claudeMd && { claudeMd }),
    currentDate: `Today's date is ${getLocalISODate()}.`,
  }
})

Then in src/utils/api.ts, it becomes a fake user message prepended before the real conversation:

function prependUserContext(messages, context) {
  return [
    createUserMessage({
      content: `<system-reminder>
As you answer the user's questions, you can use the following context:
${Object.entries(context)
  .map(([key, value]) => `# ${key}\n${value}`)
  .join('\n')}

IMPORTANT: this context may or may not be relevant to your tasks.
</system-reminder>`,
      isMeta: true,  // invisible in REPL, but sent to model
    }),
    ...messages,
  ]
}

Layer 3: System Prompt Priority Chain

From src/utils/systemPrompt.ts:

1. Override prompt (loop mode)         → REPLACES everything
2. Coordinator prompt (multi-agent)    → REPLACES default
3. Agent prompt (subagent definition)  → REPLACES or appends to default
4. Custom --system-prompt flag         → REPLACES default
5. Default prompt                      → The standard big prompt
   +
   appendSystemPrompt                  → Always added at end

CLAUDE.md Discovery Order

Files are discovered bottom-up from src/utils/claudemd.ts:

Priority	Source	Path
1 (lowest)	Managed	`/etc/claude-code/CLAUDE.md`
2	User	`~/.claude/CLAUDE.md`
3	Project	`CLAUDE.md`, `.claude/CLAUDE.md`, `.claude/rules/*.md`
4 (highest)	Local	`CLAUDE.local.md`

Files closer to cwd load last → model pays more attention (recency bias in context window). Referenced files (via @path syntax) are resolved and inlined.

Final Assembly in `services/api/claude.ts`

// 1. Prepend attribution + CLI prefix to system prompt
systemPrompt = [
  getAttributionHeader(fingerprint),
  getCLISyspromptPrefix(),
  ...systemPrompt,           // already has systemContext appended
  ...advisorBlocks,          // if advisor enabled
]

// 2. Split for prompt caching
system = buildSystemPromptBlocks(systemPrompt, enableCaching)
//  → Block 1: prefix (global cache scope)
//  → Block 2: rest (org cache scope)

// 3. Messages get user context prepended + cache breakpoints
messages = addCacheBreakpoints(
  prependUserContext(messagesForQuery, userContext)
)

// 4. Final request
{ model, system, messages, tools, max_tokens, thinking, betas, ... }

Caching Strategy

getSystemContext and getUserContext are memoized — computed once per session
CLAUDE.md changes mid-session won't be picked up (until /clear)
Git status is a snapshot from session start
System prompt blocks get prompt caching headers (reused across API requests)

The `<system-reminder>` Pattern

This tag wraps all system-injected context: user context, tool results, file warnings, memory notes. It tells the model "this is system-generated context, not user input." The model is trained to weight these as optional background information.

What Actually Improves Results (Beyond Raw Model)

Things that shape output quality

Tool deferred loading — Not all ~45 tools shown at once. Reduces tool confusion, improves selection accuracy.
Stop hooks as quality gates — External validators (linters, tests) can reject output and force retries with error context.
Compaction by separate model call — Long conversations get summarized by a separate Claude call, preserving semantic quality instead of naive truncation.
Max output tokens recovery — Injects a guided "Resume directly..." meta message for coherent continuation.
Tool result budgeting — Oversized results replaced with placeholders, keeping the model focused.
Microcompaction — Old tool results from previous turns get content cleared between iterations.
Backfill mechanism — Tools retroactively enrich their own inputs (e.g., expand relative paths), so future turns have better context.

What's NOT there

No best-of-N sampling or reranking
No output quality classifier
No self-reflection loop
No post-editing of model output
No client-side chain-of-thought injection

The real leverage is context assembly — putting the model in the best position to succeed on the first try, then handling recovery when it doesn't.

Query Loop State Machine

while (true) {
  1. SETUP
     - Destructure state, start memory prefetch

  2. API CALL & STREAM
     - Call model (with fallback logic)
     - Backfill tool_use inputs
     - Withhold errors if recoverable
     - Yield streamed messages + concurrent tool execution

  3. POST-SAMPLING
     - Execute post-sampling hooks (fire-and-forget)

  4. STOP CONDITIONS
     - Check abort
     - Try context-collapse drain
     - Try reactive compact
     - Try max output tokens recovery
     - Execute stop hooks
     - Check token budget

  5. TOOL EXECUTION (if needed)
     - Partition into batches (read-only concurrent, write serial)
     - Execute, normalize results, accumulate

  6. PREPARE NEXT ITERATION
     - Normalize messages for API
     - Apply tool result budget
     - Apply microcompaction
     - Check autocompaction threshold
     - Update state → continue
}

Source: Extracted from @anthropic-ai/claude-code v2.1.88 npm package source map (cli.js.map)

alileza/claude-code-internals.md