Analysis of the extracted source code of Claude Code CLI v2.1.88 (
@anthropic-ai/claude-code)
Claude Code is a sophisticated agentic loop wrapped in a terminal UI (~1,884 TypeScript files). The core data flow:
User Input → CLI Parser → Query Loop → Anthropic API → Tool Execution → Terminal UI
| Subsystem | Location | Purpose |
|---|---|---|
| Entry & Init | src/main.tsx |
Commander CLI setup, bootstrap sequence |
| Query Engine | src/query.ts, src/QueryEngine.ts |
Async generator driving the conversation loop |
| Tool System | src/Tool.ts, src/tools/ |
~45 tools (Bash, File ops, Search, Agent, MCP, etc.) |
| Bridge | src/bridge/ |
Remote mode + claude.ai web sync |
| Commands | src/commands/ |
Slash commands (/commit, /config, etc.) |
| UI | src/ink/, src/screens/ |
Ink (React for terminal) rendering |
| Skills | src/skills/ |
Loadable markdown/shell scripts from .claude/skills/ |
| MCP | src/services/mcp/ |
Model Context Protocol server connections |
| State | src/state/ |
Zustand store for settings, permissions, tasks, UI |
| API Client | src/services/api/claude.ts |
Anthropic SDK wrapper with streaming, retries, caching |
Everything that happens between the user typing a prompt and the API call being made.
- Slash commands detected (anything starting with
/) and routed separately - Images resized/downsampled to fit API limits
- Pasted content extracted, stored to disk, metadata collected
- Shell hooks configured in
settings.jsonexecute before the prompt is sent - Hooks can block the prompt, prevent continuation, or inject additional context
- Hook output capped at 10k chars
- IDE selections, file references, agent mentions scanned from prompt text
- Creates extra
AttachmentMessageobjects (memory files, git diffs, etc.) injected alongside the user message
Priority chain (first match wins):
- Override system prompt (e.g., loop mode)
- Coordinator system prompt (multi-agent)
- Agent system prompt (if running as subagent)
- Custom
--system-promptflag - Default system prompt (the standard big prompt)
appendSystemPromptalways tacked on at end
Then appended to the system prompt:
- Git status snapshot (branch, status, recent commits)
- Attribution fingerprint header
- Advisor instructions (if enabled)
A <system-reminder> meta message prepended to the conversation containing:
- CLAUDE.md files (auto-discovered from directory tree +
~/.claude/) - Current date
- Marked
isMetaso it's invisible in the REPL but sent to the model
- Virtual messages stripped (display-only, never sent)
- Consecutive user messages merged (Bedrock compatibility)
- Attachment messages bubbled up to correct position
- Error-triggered media blocks stripped (if a PDF/image was too large last turn, the block is removed)
- Tool reference blocks get a sibling "Tool loaded." text block injected
- System/progress messages filtered out
- Each tool's Zod schema → JSON Schema
- Deferred tools marked with
defer_loading: true(model sees the name but can't call until fetched) - Cache control markers added (
ephemeralwith scopeglobal/org) - Schemas cached per session to prevent mid-conversation flips
- System prompt split into prefix (globally cacheable) and rest (org-scoped)
- Cache control with TTL (
1hfor eligible users) - If MCP tools present, global caching disabled (dynamic tool definitions)
- Media item count capped at 100 (oldest stripped)
- Every
tool_useblock verified to have a matchingtool_result(synthetic placeholders inserted if missing) - Thinking blocks stripped if model changed mid-conversation
- Headers: API key, client ID, request attribution fingerprint
Everything that happens after Claude's API response comes back.
- Tool inputs backfilled during streaming (e.g.,
FileReadToolexpands relative paths to absolute) - On model fallback, tombstone messages replace orphaned assistant messages
- 413 (prompt too long), max output tokens, media too large errors are withheld from the user
- Recovery is attempted first — only shown if recovery fails
- Tools execute concurrently during model streaming (not after)
- Results yielded to UI as they complete, before the turn ends
- Internal hook registry runs after streaming completes
- Fire-and-forget (async, errors logged but don't block)
- Used for analytics/instrumentation
- All configured stop hooks execute after the model says "I'm done"
- Can produce blocking errors → triggers a retry with the error as new context
- Can set
preventContinuation→ forces loop exit
If no tool calls and an error was withheld:
| Recovery | What it does |
|---|---|
| Context-collapse drain | Removes staged collapses to free tokens |
| Reactive compact | Full conversation summary on 413/media errors |
| Max output tokens escalation | Retry at 64k tokens, then inject "resume" meta message |
| Token budget check | If budget exhausted → exit; else inject nudge with remaining % |
Each recovery continues the loop with new state rather than erroring out.
- Results become
ToolResultBlockParamwithis_errorflag - Content replacement applied if result exceeds
maxResultSizeChars - Images handled separately from text in results
Before the next API call:
- Tool result budget enforced — oversized results replaced with placeholders
- Microcompaction (feature-gated) — clears old cached tool results from previous turns
- Autocompaction — if token count exceeds threshold, a separate Claude call summarizes the conversation
Has tool_use blocks? → Execute tools → Continue loop
Stop hooks blocked? → Inject error → Continue loop
No more work? → Return terminal state → Exit
Context is built in three separate layers, each injected differently into the API request.
┌────────────────────────────────────────────────────────┐
│ SYSTEM PROMPT (the "system" field in API request) │
│ ├── Attribution header (billing fingerprint) │
│ ├── CLI prefix ("You are Claude Code...") │
│ ├── Default prompt (tools, behavior, instructions) │
│ ├── MCP server instructions │
│ ├── Memory mechanics prompt │
│ └── System Context (appended): │
│ ├── gitStatus: branch, status, recent commits │
│ └── cacheBreaker (debug, if enabled) │
├────────────────────────────────────────────────────────┤
│ SYNTHETIC USER MESSAGE (prepended to messages[0]) │
│ <system-reminder> │
│ # claudeMd │
│ [Contents of all CLAUDE.md files] │
│ # currentDate │
│ Today's date is 2026-03-31. │
│ IMPORTANT: this context may or may not be relevant... │
│ </system-reminder> │
├────────────────────────────────────────────────────────┤
│ ACTUAL CONVERSATION (the real messages) │
│ user → assistant → tool_use → tool_result → ... │
└────────────────────────────────────────────────────────┘
From src/context.ts:
export const getSystemContext = memoize(async () => {
const gitStatus = isRemote() || !shouldIncludeGitInstructions()
? null
: await getGitStatus() // branch, status (2k cap), last 5 commits
return {
...(gitStatus && { gitStatus }),
}
})Then in src/utils/api.ts, it's flattened and appended:
function appendSystemContext(systemPrompt, context) {
return [
...systemPrompt,
Object.entries(context)
.map(([key, value]) => `${key}: ${value}`)
.join('\n'),
]
}Git context becomes plain text at the tail end of the system prompt.
From src/context.ts:
export const getUserContext = memoize(async () => {
const claudeMd = shouldDisableClaudeMd
? null
: getClaudeMds(filterInjectedMemoryFiles(await getMemoryFiles()))
return {
...(claudeMd && { claudeMd }),
currentDate: `Today's date is ${getLocalISODate()}.`,
}
})Then in src/utils/api.ts, it becomes a fake user message prepended before the real conversation:
function prependUserContext(messages, context) {
return [
createUserMessage({
content: `<system-reminder>
As you answer the user's questions, you can use the following context:
${Object.entries(context)
.map(([key, value]) => `# ${key}\n${value}`)
.join('\n')}
IMPORTANT: this context may or may not be relevant to your tasks.
</system-reminder>`,
isMeta: true, // invisible in REPL, but sent to model
}),
...messages,
]
}From src/utils/systemPrompt.ts:
1. Override prompt (loop mode) → REPLACES everything
2. Coordinator prompt (multi-agent) → REPLACES default
3. Agent prompt (subagent definition) → REPLACES or appends to default
4. Custom --system-prompt flag → REPLACES default
5. Default prompt → The standard big prompt
+
appendSystemPrompt → Always added at end
Files are discovered bottom-up from src/utils/claudemd.ts:
| Priority | Source | Path |
|---|---|---|
| 1 (lowest) | Managed | /etc/claude-code/CLAUDE.md |
| 2 | User | ~/.claude/CLAUDE.md |
| 3 | Project | CLAUDE.md, .claude/CLAUDE.md, .claude/rules/*.md |
| 4 (highest) | Local | CLAUDE.local.md |
Files closer to cwd load last → model pays more attention (recency bias in context window). Referenced files (via @path syntax) are resolved and inlined.
// 1. Prepend attribution + CLI prefix to system prompt
systemPrompt = [
getAttributionHeader(fingerprint),
getCLISyspromptPrefix(),
...systemPrompt, // already has systemContext appended
...advisorBlocks, // if advisor enabled
]
// 2. Split for prompt caching
system = buildSystemPromptBlocks(systemPrompt, enableCaching)
// → Block 1: prefix (global cache scope)
// → Block 2: rest (org cache scope)
// 3. Messages get user context prepended + cache breakpoints
messages = addCacheBreakpoints(
prependUserContext(messagesForQuery, userContext)
)
// 4. Final request
{ model, system, messages, tools, max_tokens, thinking, betas, ... }getSystemContextandgetUserContextare memoized — computed once per session- CLAUDE.md changes mid-session won't be picked up (until
/clear) - Git status is a snapshot from session start
- System prompt blocks get prompt caching headers (reused across API requests)
This tag wraps all system-injected context: user context, tool results, file warnings, memory notes. It tells the model "this is system-generated context, not user input." The model is trained to weight these as optional background information.
-
Tool deferred loading — Not all ~45 tools shown at once. Reduces tool confusion, improves selection accuracy.
-
Stop hooks as quality gates — External validators (linters, tests) can reject output and force retries with error context.
-
Compaction by separate model call — Long conversations get summarized by a separate Claude call, preserving semantic quality instead of naive truncation.
-
Max output tokens recovery — Injects a guided "Resume directly..." meta message for coherent continuation.
-
Tool result budgeting — Oversized results replaced with placeholders, keeping the model focused.
-
Microcompaction — Old tool results from previous turns get content cleared between iterations.
-
Backfill mechanism — Tools retroactively enrich their own inputs (e.g., expand relative paths), so future turns have better context.
- No best-of-N sampling or reranking
- No output quality classifier
- No self-reflection loop
- No post-editing of model output
- No client-side chain-of-thought injection
The real leverage is context assembly — putting the model in the best position to succeed on the first try, then handling recovery when it doesn't.
while (true) {
1. SETUP
- Destructure state, start memory prefetch
2. API CALL & STREAM
- Call model (with fallback logic)
- Backfill tool_use inputs
- Withhold errors if recoverable
- Yield streamed messages + concurrent tool execution
3. POST-SAMPLING
- Execute post-sampling hooks (fire-and-forget)
4. STOP CONDITIONS
- Check abort
- Try context-collapse drain
- Try reactive compact
- Try max output tokens recovery
- Execute stop hooks
- Check token budget
5. TOOL EXECUTION (if needed)
- Partition into batches (read-only concurrent, write serial)
- Execute, normalize results, accumulate
6. PREPARE NEXT ITERATION
- Normalize messages for API
- Apply tool result budget
- Apply microcompaction
- Check autocompaction threshold
- Update state → continue
}
Source: Extracted from @anthropic-ai/claude-code v2.1.88 npm package source map (cli.js.map)