Skip to content

Instantly share code, notes, and snippets.

@w601sxs
Last active April 1, 2026 19:57
Show Gist options
  • Select an option

  • Save w601sxs/8db046b93908d4dc90c5632829d39321 to your computer and use it in GitHub Desktop.

Select an option

Save w601sxs/8db046b93908d4dc90c5632829d39321 to your computer and use it in GitHub Desktop.
Claude Code Memory compaction walkthrough

Claude Code Memory compaction — detailed walkthrough (based on rust/crates/runtime/src/compact.rs)

This gist explains, step by step, how the compaction logic in rust/crates/runtime/src/compact.rs works. It covers:

  • When compaction is triggered
  • How tokens are estimated
  • How messages are summarized and merged
  • How the compacted session is constructed
  • Edge cases and tests that illustrate behavior

TL;DR

Compaction replaces older messages in a Session with a single System message that contains a generated summary (plus a short "continuation" preamble). The newest N messages are preserved verbatim. The system only compacts if there are enough messages to compact and the estimated token count of the compactable messages exceeds a configured threshold. If a previous compaction summary already exists in the first (system) message, it is preserved and merged with the new summary during subsequent compactions.


Key types and constants

  • CompactionConfig

    • preserve_recent_messages: how many most recent messages to keep verbatim (default 4)
    • max_estimated_tokens: threshold to trigger compaction (default 10000)
  • CompactionResult

    • summary: raw generated XML-like summary (with tags)
    • formatted_summary: human-friendly "Summary:" text produced by format_compact_summary
    • compacted_session: new Session with system summary + preserved messages
    • removed_message_count: number of messages removed/compacted
  • Important constants

    • COMPACT_CONTINUATION_PREAMBLE: the preamble text placed before the formatted summary in the System message
    • COMPACT_RECENT_MESSAGES_NOTE: "Recent messages are preserved verbatim."
    • COMPACT_DIRECT_RESUME_INSTRUCTION: instruction telling the assistant to resume without asking the user more questions

When do we compact? (should_compact)

Code summary (conceptual):

  1. If the session already has a compaction summary as its first message, we ignore that message when deciding whether to compact again (we don't count that system message as part of the content to measure).
  2. Let compactable = session.messages[start..] where start = 1 if first message is an existing compacted summary else 0.
  3. Compacting happens when BOTH:
    • compactable.len() > preserve_recent_messages
    • sum(estimate_message_tokens(each message in compactable)) >= max_estimated_tokens

So: compaction requires there are more messages than the preserve threshold and the estimated token count meets or exceeds the configured limit.


Token estimation

Function: estimate_message_tokens(message)

  • For each ContentBlock inside a message:
    • Text: estimate = text.len() / 4 + 1
    • ToolUse: estimate = (name.len() + input.len()) / 4 + 1
    • ToolResult: estimate = (tool_name.len() + output.len()) / 4 + 1
  • Message tokens = sum of block estimates

Note: This is a rough heuristic (chars ÷ 4 + 1), used to trigger compaction but not perfect.


The compaction flow (compact_session) — step by step

  1. If should_compact(session, config) is false:

    • Return a CompactionResult with empty summary and the original session unchanged.
  2. If compaction should occur:

    • Detect existing_summary by checking whether the first message is a System message that begins with the COMPACT_CONTINUATION_PREAMBLE. If present compacted_prefix_len = 1 else 0.
    • Determine keep_from = session.messages.len().saturating_sub(config.preserve_recent_messages). Everything from keep_from.. is preserved verbatim.
    • removed = session.messages[compacted_prefix_len..keep_from] — these messages will be summarized and removed.
    • preserved = session.messages[keep_from..] — preserved messages appended after the new system summary.
    • Create a summary string with:
      • summarize_messages(removed) to generate a new block summarizing the removed messages
      • merge_compact_summaries(existing_summary, new_summary) which merges the previous summary (if any) into the new summary (keeping "Previously compacted context:" and "Newly compacted context:" sections)
    • Format the human-facing summary with format_compact_summary(&summary)
    • Build the continuation/system message with get_compact_continuation_message(&summary, true, !preserved.is_empty()):
      • This is COMPACT_CONTINUATION_PREAMBLE + formatted summary
      • If preserved messages exist, append the "Recent messages are preserved verbatim." note
      • If suppress_follow_up_questions is true (it is), append the direct resume instruction
    • Construct the new session:
      • messages = [ ConversationMessage::System( continuation_text ) ] + preserved
    • removed_message_count = removed.len()

Result: a Session significantly shorter (older messages replaced by a single system summary message).


Summarization details (summarize_messages)

summarize_messages builds an XML-ish block that looks like:

Conversation summary: - Scope: X earlier messages compacted (user=U, assistant=A, tool=T). - Tools mentioned: ... - Recent user requests: - ... - Pending work: - ... - Key files referenced: ... - Current work: ... - Key timeline: - role: content - ...

Important subroutines:

  • summarize_block(ContentBlock)

    • Summarizes each block into a short string:
      • Text block: raw text (later truncated to 160 chars)
      • ToolUse: "tool_use NAME(INPUT)"
      • ToolResult: "tool_result NAME: [error ]OUTPUT"
    • Then truncates with truncate_summary (max 160 chars; adds '…' if trimmed)
  • collect_recent_role_summaries(messages, MessageRole::User, 3)

    • Collects up to 3 most recent user text blocks (non-empty) and truncates them
  • infer_pending_work(messages)

    • Scans recent text blocks for keywords like "todo", "next", "pending", "follow up", "remaining" and returns up to 3 truncated items
  • collect_key_files(messages)

    • Scans all text/tool input/output tokens for path-like tokens with interesting extensions (rs, ts, tsx, js, json, md)
    • Returns up to 8 unique file paths
  • infer_current_work(messages)

    • Returns the most recent non-empty text block (truncated to 200 chars) as the "current work"
  • Key timeline

    • Appends a "- role: content" line for each message being summarized; content is the concatenation of summarize_block results for the message blocks

Merging previous summaries (merge_compact_summaries)

  • If there was no previous summary: return the newly generated summary as-is.
  • If there was a previous summary:
    • extract highlights (everything before the "- Key timeline:" section) from the existing and from the new formatted summaries
    • extract new timeline lines from the new summary
    • Build a merged that contains:
      • "Previously compacted context:" with previous highlights
      • "Newly compacted context:" with new highlights
      • "- Key timeline:" with the new timeline lines
    • This keeps earlier summarization context visible and avoids throwing it away when compacting again.

Helpers used:

  • extract_summary_highlights(summary)
    • uses format_compact_summary(summary) then returns non-empty lines excluding the timeline
  • extract_summary_timeline(summary)
    • returns the timeline lines under "- Key timeline:" from format_compact_summary(summary)

Formatting the summary for human reading (format_compact_summary)

  • Removes blocks entirely with strip_tag_block(summary, "analysis")
  • If there is a "..." block, replace it with:
    • "Summary:\n" + the interior of the block (trimmed)
  • collapse_blank_lines to avoid repeated blank lines
  • trim final whitespace

So the formatted summary is a clean plain-text summary that starts with "Summary:\n" and contains the items assembled earlier.


Continuation system message (get_compact_continuation_message)

  • Base content: COMPACT_CONTINUATION_PREAMBLE + formatted summary
  • If recent_messages_preserved: append blank line + COMPACT_RECENT_MESSAGES_NOTE
  • If suppress_follow_up_questions: append COMPACT_DIRECT_RESUME_INSTRUCTION
  • The resulting string is placed into the first message (a System message) of the compacted session.

This System message tells the assistant (and any downstream model) that older context was summarized and that it should continue directly.


Extracting an existing compacted summary (extract_existing_compacted_summary)

Given a ConversationMessage (expected to be System-type), this helper:

  • Checks role == System
  • Finds first non-empty text block and verifies it starts with COMPACT_CONTINUATION_PREAMBLE
  • Removes the preamble, and strips off any trailing COMPACT_RECENT_MESSAGES_NOTE or COMPACT_DIRECT_RESUME_INSTRUCTION if present
  • Returns the "raw" summary substring (trimmed) if found

This is used to discover if a session already contains a compaction summary at the front and to pull the existing summary for merging.


Idempotence & repeated compaction behavior

  • should_compact ignores the first (system) message if it contains a previous compacted summary, so decisions about whether to re-compact are based only on the "real" user/assistant/tool messages.
  • When compacting again, previous summary content is preserved and merged into the new summary so the compacted context grows in a controlled way (previous highlights + new highlights + new timeline).
  • If there are not enough "real" messages to exceed the preserve_recent_messages threshold or token threshold, compact_session returns the original session unchanged.

Visual flow (mermaid)

Paste the following into a renderer that supports mermaid:

sequenceDiagram
  participant U as User
  participant A as Assistant
  participant S as Session (messages list)
  participant C as Compactor

  U->>A: many conversational turns ...
  Note over S: S contains many messages: [maybe System summary], msg0, msg1, ..., msgN
  C->>S: should_compact?
  alt not enough messages or tokens
    C-->>S: return unchanged
  else compact
    C->>S: identify removed (older) messages and preserved (N recent)
    C->>C: summarize removed -> new_summary
    C->>C: merge existing_summary? -> merged_summary
    C->>S: build System continuation text with merged_summary
    C-->>S: new session = [System continuation] + preserved messages
  end
Loading

ASCII snapshot before/after:

Before:

  • [maybe System(summary-of-old)] (optional)
  • msg0
  • msg1
  • ...
  • msgK
  • msgK+1
  • ...
  • msgN (last)

After:

  • System( COMPACT_CONTINUATION_PREAMBLE + formatted merged summary + maybe notes/instruction )
  • msgK+1
  • ...
  • msgN

Where K+1..N are the preserved (most recent preserve_recent_messages) messages.


Example (based on tests in file)

Given session messages:

  1. (possibly) System(summary-of-old)
  2. big user message "one ... repeated"
  3. big assistant message "two ... repeated"
  4. tool result with big output
  5. recent assistant message "recent"

If config:

  • preserve_recent_messages = 2
  • max_estimated_tokens = 1 (low threshold)

Then compact_session will:

  • detect removed messages = indices after the optional system summary up to len - 2 (so it removes the older large messages)
  • preserved = last 2 messages (e.g., the tool result and the recent assistant)
  • generate a with Scope, Tools mentioned, Key timeline etc.
  • build a System message with the continuation preamble + "Summary: ..." + "Recent messages are preserved verbatim." + direct resume instruction
  • final session = [the above System message] + preserved messages

Tests assert:

  • removed_message_count == number removed
  • first message role is System and contains "Summary:"
  • formatted_summary contains "Scope:" and "Key timeline:"

Important edge cases & behavior notes

  • Existing compacted System message is ignored for compaction decision; but its content is re-used via merge_compact_summaries so context is not lost.
  • compact_session is conservative: it requires both more-than-preserve count and a token threshold to compact.
  • The token estimation is very approximate and based on character length (/4). This is acceptable for a trigger heuristic but not exact model tokenizer counts.
  • Truncation: content in timeline and snippets is truncated to prevent long timeline lines (truncate_summary uses a char count limit and appends '…').
  • File references are heuristically detected by tokenizing whitespace and checking for “/” and certain file extensions. This can miss or include spurious tokens.
  • Pending work detection is heuristic via substring search ("todo", "next", etc.) — not perfect.

Where to look in the source (quick map)

  • should_compact -> lines ~37..47
  • compact_session -> lines ~89..131
  • summarize_messages -> lines ~143..228
  • merge_compact_summaries -> lines ~230..263
  • format_compact_summary -> lines ~50..62
  • get_compact_continuation_message -> lines ~65..86
  • estimate_message_tokens -> lines ~391..403
  • extract_existing_compacted_summary -> lines ~442..456

(Exact line numbers in the file you provided.)


Practical implications & recommended tuning

  • preserve_recent_messages controls how "sticky" the most recent context is. Increasing it keeps more verbatim context for the assistant.
  • max_estimated_tokens controls how aggressively the system compacts; lower -> compacts more frequently.
  • Because token estimation is approximate, set max_estimated_tokens with some margin. If you observe premature compaction, raise the threshold or improve the estimation function to use tokenization consistent with your model.
  • If you rely heavily on file references, consider expanding has_interesting_extension or improving file extraction logic.

Tests included (behavioral samples)

The file includes unit tests that cover:

  • formatting of summaries with and blocks
  • not compacting small sessions
  • compacting older messages and verifying removed count & summary content
  • preserving previous compacted context when compacting again
  • ignoring existing compacted summary when deciding whether to compact
  • truncation behavior
  • file extraction behavior
  • pending work inference

These tests give a good set of scenarios to verify any tuning or refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment