Skip to content

Instantly share code, notes, and snippets.

@karpathy
Created April 4, 2026 16:25
Show Gist options
  • Select an option

  • Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.

Select an option

Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.

This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.

You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

This can apply to a lot of different contexts. A few examples:

  • Personal: tracking your own goals, health, psychology, self-improvement — filing journal entries, articles, podcast notes, and building up a structured picture of yourself over time.
  • Research: going deep on a topic over weeks or months — reading papers, articles, reports, and incrementally building a comprehensive wiki with an evolving thesis.
  • Reading a book: filing each chapter as you go, building out pages for characters, themes, plot threads, and how they connect. By the end you have a rich companion wiki. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages covering characters, places, events, languages, built by a community of volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing and maintenance.
  • Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. Possibly with humans in the loop reviewing updates. The wiki stays current because the LLM does the maintenance that no one on the team wants to do.
  • Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.

Architecture

There are three layers:

Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

Operations

Ingest. You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. Personally I prefer to ingest sources one at a time and stay involved — I read the summaries, check the updates, and guide the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. It's up to you to develop the workflow that fits your style and document it in the schema for future sessions.

Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms depending on the question — a markdown page, a comparison table, a slide deck (Marp), a chart (matplotlib), a canvas. The important insight: good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.

Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. The LLM is good at suggesting new questions to investigate and new sources to look for. This keeps the wiki healthy as it grows.

Indexing and logging

Two special files help the LLM (and you) navigate the wiki as it grows. They serve different purposes:

index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count. Organized by category (entities, concepts, sources, etc.). The LLM updates it on every ingest. When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.

log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title), the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log gives you a timeline of the wiki's evolution and helps the LLM understand what's been done recently.

Optional: CLI tools

At some point you may want to build small tools that help the LLM operate on the wiki more efficiently. A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option: it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It has both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you vibe-code a naive search script as the need arises.

Tips and tricks

  • Obsidian Web Clipper is a browser extension that converts web articles to markdown. Very useful for quickly getting sources into your raw collection.
  • Download images locally. In Obsidian Settings → Files and links, set "Attachment folder path" to a fixed directory (e.g. raw/assets/). Then in Settings → Hotkeys, search for "Download" to find "Download attachments for current file" and bind it to a hotkey (e.g. Ctrl+Shift+D). After clipping an article, hit the hotkey and all images get downloaded to local disk. This is optional but useful — it lets the LLM view and reference images directly instead of relying on URLs that may break. Note that LLMs can't natively read markdown with inline images in one pass — the workaround is to have the LLM read the text first, then view some or all of the referenced images separately to gain additional context. It's a bit clunky but works well enough.
  • Obsidian's graph view is the best way to see the shape of your wiki — what's connected to what, which pages are hubs, which are orphans.
  • Marp is a markdown-based slide deck format. Obsidian has a plugin for it. Useful for generating presentations directly from wiki content.
  • Dataview is an Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
  • The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free.

Why this works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

The idea is related in spirit to Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's vision was closer to this than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part he couldn't solve was who does the maintenance. The LLM handles that.

Note

This document is intentionally abstract. It describes the idea, not a specific implementation. The exact directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain, your preferences, and your LLM of choice. Everything mentioned above is optional and modular — pick what's useful, ignore what isn't. For example: your sources might be text-only, so you don't need image handling at all. Your wiki might be small enough that the index file is all you need, no search engine required. You might not care about slide decks and just want markdown pages. You might want a completely different set of output formats. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest.

@equationalapplications
Copy link
Copy Markdown

Prisma Adapter for Core LLM Wiki

https://www.npmjs.com/package/@equationalapplications/prisma-outbox

  • Synchronizing Edge/Local AI Memory to the Cloud: If you have a local AI agent (using @equationalapplications/core-llm-wiki) writing memories to a local SQLite database, you can use this adapter to reliably replicate those memories up to your central cloud database (like PostgreSQL or MySQL) managed by Prisma.
  • Reliable, Lossless Data Replication: Because it uses the transactional outbox pattern, it ensures that every change the LLM makes locally is eventually synced to the main database. If the network goes down or the main database is temporarily unavailable, the events wait safely in the local SQLite outbox until they can be successfully processed and acknowledged.
  • Triggering Downstream Application Side Effects: When the LLM updates a specific record locally (e.g., learning a new user preference), you can use the mapEvent function to trigger broader changes in your main Prisma database, such as updating a user's main profile, sending a notification, or adjusting application settings based on what the LLM learned.
  • Offline-First Architectures: It allows the primary application (and the LLM) to interact with a fast, local SQLite database without waiting for network calls. The adapter handles the slow, asynchronous work of syncing that data to the main backend in the background.

@alxraun
Copy link
Copy Markdown

alxraun commented May 17, 2026

What about storing wiki pages in this format? Compressed the original from 2500 tokens down to 1000. Try feeding it to an LLM like this:

# LLM_WIKI_PATTERN
* source: `https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f#file-llm-wiki-md` 

## PARADIGM_SHIFT
* standard_rag
  * flow: query -> search(raw_docs) -> transient_synthesis
  * attributes: stateless
  * result: knowledge_accumulation == 0
* **llm_wiki**
  * flow: ingest(raw_docs) -> update(persistent_graph) -> query() -> compound_growth
  * attributes: stateful
  * result: knowledge_accumulation -> max
* **llm_wiki** != standard_rag

🎈 Show full
# LLM_WIKI_PATTERN
* source: `https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f#file-llm-wiki-md` 

## PARADIGM_SHIFT
* standard_rag
  * flow: query -> search(raw_docs) -> transient_synthesis
  * attributes: stateless
  * result: knowledge_accumulation == 0
* **llm_wiki**
  * flow: ingest(raw_docs) -> update(persistent_graph) -> query() -> compound_growth
  * attributes: stateful
  * result: knowledge_accumulation -> max
* **llm_wiki** != standard_rag

## ROLE_ALLOCATION
* human_user
  * responsibilities: {sourcing, exploration, inquiry, direction_setting}
* llm_agent
  * instances: {openai_codex, claude_code, open_code, pi}
  * responsibilities: {summarizing, cross_referencing, bookkeeping, updating, formatting}
  * attributes: {tireless, maintenance_cost == 0}
* system_analogy: {obsidian == ide, llm_agent == programmer, **llm_wiki** == codebase}

## ARCHITECTURE
* system_layers == raw_sources + wiki_core + schema
* raw_sources
  * format: {articles, papers, images, data_files}
  * attributes: {immutable, source_of_truth}
  * permissions: human_user -> write, llm_agent -> read_only
* wiki_core
  * format: markdown_directory
  * content: {summaries, entity_pages, concept_pages, comparisons, syntheses}
  * attributes: {mutable, compounding, interlinked}
  * permissions: llm_agent -> write_owner, human_user -> read_explorer
* schema
  * instances: {`CLAUDE.md`, `AGENTS.md`}
  * role: llm_behavior_definition
  * content: {structure_rules, conventions, workflows}
  * transformation: llm_chatbot + schema => disciplined_wiki_maintainer

## OPERATIONS
* ingest(new_source)
  * trigger: human_user -> add(raw_sources)
  * flow: read(new_source) -> discuss() -> write(summary) -> update(`index.md`) -> update(entities_concepts) -> append(`log.md`)
  * scope: 1_source => modifies(10...15_pages)
* query(question)
  * flow: question -> read(`index.md`) -> read(relevant_pages) -> synthesize(answer, citations)
  * output_formats: {markdown_page, comparison_table, marp_deck, matplotlib_chart, canvas}
  * core_heuristic: valuable_answer -> file_back(wiki_core) => compound_growth
* lint()
  * frequency: periodic
  * targets: {contradictions, stale_claims, orphan_pages, missing_links, missing_concepts, data_gaps}
  * actions: fix(targets) + suggest(new_questions, new_sources)
  * result: wiki_health -> max

## NAVIGATION_STATE
* `index.md`
  * nature: content_catalog
  * structure: list[link, 1_line_summary, metadata]
  * organization: category_based
  * utility: rag_infrastructure_replacement @ scale < 1000_sources
  * trigger: update() @ ingest()
* `log.md`
  * nature: chronological_ledger
  * attributes: append_only
  * syntax_rule: `## [YYYY-MM-DD] action | title`
  * utility: unix_parsing_compatibility + temporal_context_awareness

## TOOLING_ECOSYSTEM
* search_engine: {`qmd` | custom_script}
  * features: {bm25, vector_search, llm_reranking}
  * interfaces: {cli, mcp_server}
* obsidian_web_clipper -> html_to_markdown_ingestion
* local_assets: download_images -> local_path => link_rot_prevention
* llm_vision_constraint: inline_images -> read_text_first -> view_images_separately
* obsidian_graph_view -> visualize(topology, hubs, orphans)
* marp -> markdown_to_slides_transformation
* dataview -> query(yaml_frontmatter) => dynamic_tables
* git -> {version_control, history, branching, collaboration}

## USE_CASES
* domains: {personal_tracking, research, book_companion, team_knowledge_base, deep_dives}
* book_companion_scale: human_user + llm_agent ~ community_effort @ tolkien_gateway

## TELEOLOGY
* failure_mode(human_wiki): maintenance_burden > value => abandonment
* success_mode(**llm_wiki**): llm_agent -> bookkeeping => maintenance_burden == 0 => persistence
* lineage: **llm_wiki** ~ vannevar_bush_memex_1945
  * shared_traits: [private, actively_curated, associative_trails]
  * solved_problem: memex_maintenance == llm_agent
* modularity: implementation_details -> adaptable(domain, preferences, llm_choice)

@eslamgenio
Copy link
Copy Markdown

eslamgenio commented May 17, 2026

AI Agents forget. Every session, they start from zero!

Andrej Karpathy keeps making the point that agents need durable memory beyond the context window.

I kept turning that over, then looked at my Obsidian vault and asked: what if this was the memory?!

So I built it with the help of Hermes Agent: "long-term-agent-memory"

A filesystem-first system for storing sessions, decisions, procedures, and linked knowledge that agents can keep forever!

If you're working on agent workflows or long-horizon AI, I'd love your thoughts.

Inspired by Andrej Karpathy's LLM Wiki: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

long-term-agent-memory Repo:

🔗 https://github.com/eslamgenio/long-term-agent-memory

@skyllwt
Copy link
Copy Markdown

skyllwt commented May 18, 2026

ΩmegaWiki(680+⭐) just shipped v1.4.0 — the research loop is now end-to-end.
Repo: https://github.com/skyllwt/OmegaWiki

• 26 Claude Code skills, covering the full paper lifecycle:
discover → ingest → graph → ideate → experiment → draft → poster → rebuttal
• 8 typed entities · 20 typed edges
• Bilingual (EN + 中文), every skill ships in both languages
• 4 new skills shipped this month

截图 2026-05-05 12-27-01

v1.4.0 highlight: /poster

Paper accepted? Run one command, get a 1400×900 conference poster +
print-quality PNG. Reads your LaTeX source, auto-rasterizes TikZ figures,
renders booktabs tables as live HTML, preserves math via KaTeX, walks you
through figure picks and header config. PDF export from any browser. Section
distillation is grounded in your wiki's paper-plan, ideas, experiments — not a
blind LLM summary of the abstract.

截图 2026-05-18 09-07-17

Try ΩmegaWiki in Claude Code and run the full LLM-Wiki loop:

/discover --venue iclr --year 2026 → ranked reading list
/ingest → typed entry, linked into the graph
/ideate + /novelty → ideas with bibliographic grounding
/exp-design → /exp-run → /exp-eval → pilot-first experiment workflow
/paper-plan → /paper-draft → /paper-compile
/poster → conference poster (NEW)
/rebuttal → reviewer response

End to end. One wiki. No chunks.

If ΩmegaWiki looks interesting, a ⭐ would encourage and motivate us a lot 😀
https://github.com/skyllwt/OmegaWiki

微信图片_20260505122754_295_16 微信图片_20260505122755_296_16

@Sistema2D
Copy link
Copy Markdown

I've built FrameCode VibeWork, a documentation and governance framework based on this LLM Wiki concept to reduce context loss during AI-assisted development. It organizes plans, changelogs, and an incremental technical memory. Check it out here: https://github.com/Sistema2D/FrameCode-VibeWork

Now avaiable in EN-US! (P.S.: Im not a bot)

@ojuschugh1
Copy link
Copy Markdown

  ███████╗ ██████╗ ███████╗
  ██╔════╝██╔═══██╗╚══███╔╝
  ███████╗██║   ██║  ███╔╝
  ╚════██║██║▄▄ ██║ ███╔╝
  ███████║╚██████╔╝███████╗
  ╚══════╝ ╚══▀▀═╝ ╚══════╝
  

Compress LLM context to save tokens and reduce costs

Real session stats: 3,003 compressions · 178,442 tokens saved · 24.7% avg reduction · up to 92% with dedup

Featured

Crates.io npm PyPI VS Code Firefox JetBrains Discord Homebrew

Install · How It Works · Supported Tools · Changelog · Discord


sqz compresses command output before it reaches your LLM. Single Rust binary, zero config.

The real win is dedup: when the same file gets read 5 times in a session, sqz sends it once and returns a 13-token reference for every repeat.

Without sqz:                    With sqz:

File read #1:  2,000 tokens     File read #1:  ~800 tokens (compressed)
File read #2:  2,000 tokens     File read #2:  ~13 tokens  (dedup ref)
File read #3:  2,000 tokens     File read #3:  ~13 tokens  (dedup ref)
───────────────────────         ───────────────────────
Total:         6,000 tokens     Total:         ~826 tokens (86% saved)

Token Savings

24.7% average reduction across 3,003 real compressions ·
92% saved on repeated file reads ·
86% on shell/git output ·
13-token refs for cached content

One developer's week, measured from actual sqz gain output:

$ sqz gain
sqz token savings (last 7 days)
──────────────────────────────────────────────────
  04-13 │                              │   2,329 saved
  04-14 │                              │       0 saved
  04-15 │███                           │  12,954 saved
  04-16 │██                            │   9,223 saved
  04-17 │████                          │  14,752 saved
  04-18 │██████████████████████████████│ 105,569 saved
  04-19 │████████                      │  30,882 saved
  04-20 │█                             │   4,334 saved
──────────────────────────────────────────────────
  Total: 3,003 compressions, 178,442 tokens saved (24.7% avg reduction)

Per-command compression

Single-command compression (measured via cargo test -p sqz-engine benchmarks):

Content Before After Saved
Repeated log lines 148 62 58%
Large JSON array 259 142 45%
JSON API response 64 53 17%
Git diff 61 54 12%
Prose/docs 124 121 2%
Stack trace (safe mode) 82 82 0%

Session-level with dedup

Where the real savings live — the cache sends each file once, repeats cost 13 tokens:

Scenario Without sqz With sqz Saved
Same file read 5× 10,000 826 92%
Same JSON response 3× 192 79 59%
Test-fix-test cycle (3 runs) 15,000 5,186 65%

Single-command compression ranges from 2–58% depending on content. Repeated reads drop to 13 tokens each. Your mileage will vary with how repetitive your tool calls are — agentic sessions with many file re-reads see the biggest wins.

Install

Prebuilt binaries (no compiler required — works on every platform):

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/ojuschugh1/sqz/main/install.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/ojuschugh1/sqz/main/install.ps1 | iex

# Any platform via npm
npm install -g sqz-cli

# macOS / Linux via Homebrew
brew tap ojuschugh1/sqz
brew install sqz

Build from source via Cargo:

cargo install sqz-cli sqz-mcp

sqz-cli provides the sqz binary; sqz-mcp provides the MCP server. sqz-engine is a library dependency — it compiles automatically and does not need to be installed separately.

Build from source (cargo install sqz-cli) works too, but needs a C toolchain:

  • Linux: build-essential (apt) or equivalent
  • macOS: Xcode Command Line Tools (xcode-select --install)
  • Windows: Visual Studio Build Tools with the "Desktop development with C++" workload. Without these, cargo install fails with linker link.exe not found. If you don't already have them, use the PowerShell or npm install above instead.

Then initialize:

sqz init --global     # hooks apply to every project on this machine
# or
sqz init              # hooks apply to just this project (.claude/settings.local.json)

--global writes to ~/.claude/settings.json (the user scope per the
Anthropic scope table),
so the sqz hook fires in every Claude Code session on this machine. This is
the common case on first install. Your existing permissions, env,
statusLine, and unrelated hooks in ~/.claude/settings.json are
preserved — sqz merges its entries rather than overwriting.

Plain sqz init (project scope) is useful when you want sqz active only
inside one repo.

Only using one agent? Pass --only (or --skip) to limit which
configs are written:

sqz init --only opencode              # just OpenCode, nothing else
sqz init --only opencode,codex        # OpenCode and Codex
sqz init --skip cursor,windsurf       # everything except Cursor and Windsurf

Accepted names: claude, cursor, windsurf, cline, gemini,
kiro, opencode, codex. Aliases (claude-code, gemini-cli, roo,
kiro-cli) also work. --only and --skip can't be combined.

Manual installation (preserve comments in your config)

sqz init round-trips your config file through a JSON parser to merge
the sqz entry, which drops any comments in your opencode.jsonc (and
the analogous JSON-with-comments files other tools accept). If you've
commented your config carefully and want to keep them, install by hand
instead.

OpenCode — two steps:

  1. Drop the plugin file in place. sqz prints the generated TS to
    stdout so you don't have to hand-write the path-escaping logic:

    mkdir -p ~/.config/opencode/plugins
    sqz print-opencode-plugin > ~/.config/opencode/plugins/sqz.ts
  2. Add the MCP entry to your existing opencode.jsonc yourself.
    Append this block inside the top-level mcp object (create the
    mcp object if it doesn't exist):

    "sqz": {
      "type": "local",
      "command": ["sqz-mcp", "--transport", "stdio"],
      "enabled": true
    }

Comments in the rest of your file stay put. OpenCode auto-discovers
the plugin file; no plugin array entry needed (adding one causes
double-loading, see issue #10).

Other tools — Claude Code, Cursor, Windsurf, Cline, Gemini CLI,
and Codex use plain JSON configs without comment support, so the
automated path is non-destructive there. Use sqz init --only <tool>
for those.

That's it. Shell hooks installed, AI tool hooks configured.

How It Works

sqz system architecture

sqz installs a PreToolUse hook that intercepts bash commands before your AI tool runs them. The output gets compressed transparently — the AI tool never knows.

Claude → git status → [sqz hook rewrites] → compressed output (85% smaller)

What gets compressed:

  • Shell output — git, cargo, npm, docker, kubectl, ls, grep, etc.
  • JSON — strips nulls, compact encoding
  • Logs — collapses repeated lines
  • Test output — shows failures only

What doesn't get compressed:

  • Stack traces, error messages, secrets — routed to safe mode (0% compression)
  • Your prompts and the AI's responses — controlled by the AI tool, not sqz

Supported Tools

Tool Integration Setup
Claude Code PreToolUse hook (transparent) sqz init
Cursor PreToolUse hook (transparent) sqz init
Windsurf PreToolUse hook (transparent) sqz init
Cline PreToolUse hook (transparent) sqz init
Gemini CLI BeforeTool hook (transparent) sqz init
Kiro PreToolUse hook (transparent) sqz init
OpenCode TypeScript plugin (transparent) sqz init
VS Code Extension Install from Marketplace
JetBrains Plugin Install from Marketplace
Chrome Browser extension ChatGPT, Claude.ai, Gemini, Grok, Perplexity
Firefox Browser extension Same sites

CLI

sqz init --global             # Install hooks for every project on this machine
sqz init                      # Install hooks for just this project
sqz init --only kiro          # Only configure Kiro (skip the rest)
sqz init --only opencode      # Only configure OpenCode (skip the rest)
sqz init --skip cursor        # Configure every agent except Cursor
sqz compress <text>           # Compress (or pipe from stdin)
sqz compress --no-cache       # Compress without dedup (always full output)
sqz expand <ref>              # Recover original content from a §ref:HASH§ token
sqz compact                   # Evict stale context to free tokens
sqz gain                      # Show daily token savings (bar chart)
sqz gain --project .          # Per-project daily gains
sqz gain --days 30            # Last 30 days
sqz stats                     # Cumulative compression report
sqz stats --breakdown         # Per-command token usage breakdown
sqz stats --project .         # Stats for current project only
sqz stats --project list      # List all tracked projects
sqz discover                  # Find missed savings
sqz resume                    # Re-inject session context after compaction
sqz vizit                     # Live terminal dashboard (like htop for AI agents)
sqz hook claude               # Process a PreToolUse hook (Claude Code)
sqz hook kiro                 # Process a PreToolUse hook (Kiro)
sqz print-opencode-plugin     # Print OpenCode plugin TS for manual install
sqz proxy --port 8080         # API proxy (compresses full request payloads)

Dedup Escape Hatch

When sqz sees the same content twice, it returns a compact §ref:HASH§ token
instead of the full text. Most models handle this fine, but some (e.g., GLM 5.1)
can't parse the ref format and loop. Four ways to work around this:

# 1. Recover original content from a ref
sqz expand a1b2c3d4              # prefix match
sqz expand '§ref:a1b2c3d4§'     # paste the whole token

# 2. Compress without dedup (per-invocation)
echo "..." | sqz compress --no-cache

# 3. Disable dedup globally (env var)
export SQZ_NO_DEDUP=1

# 4. MCP passthrough tool (returns input byte-exact, zero transforms)
# Available via tools/list when sqz-mcp is running

Track Your Own Savings

Run sqz gain in your shell any time to see your own daily breakdown (see the
Token Savings section above for what the output looks like), and sqz stats
for the full cumulative report:

$ sqz stats
  📊 sqz compression stats
  ──────────────────────────────────────────────────

  178,442  tokens saved
  ↓  24.7% average reduction

  Compressions           3,003
  Tokens in              721,840
  Tokens out             543,398
  Tokens saved           178,442
  Avg reduction          24.7%

  🗄️  Cache
  ──────────────────────────────────────────────────
  Entries                43
  Size                   39.1 KB

Add --breakdown to see exactly which commands consume the most tokens:

$ sqz stats --breakdown

  🔍 Top Token Consumers
  ──────────────────────────────────────────────────────────────────────
  command               calls  tokens in        out    saved
  ──────────────────────────────────────────────────────────────────────
  dedup                   249      45541       3237      93%
  stdin                    51      30851      24289      21%
  auto                    132      18288       7740      58%
  echo                     17       1050        558      47%
  ls -la                    8        948        948       0%
  cargo build               7        170        145      15%
  git status                4         56          8      86%
  ──────────────────────────────────────────────────────────────────────

Per-project filtering:

sqz stats --project .           # stats for current project only
sqz stats --project list        # list all tracked projects
sqz gain --project .            # daily gains for current project
sqz gain --days 30              # last 30 days instead of 7
sqz gain --days 30 --project .  # combine both

Stats are stored locally in SQLite under ~/.sqz/sessions.db — nothing leaves your machine.

How Compression Works

  1. Per-command formattersgit status → compact summary, cargo test → failures only, docker ps → name/image/status table
  2. Structural summaries — code files compressed to imports + function signatures + call graph (~70% reduction). The model sees the architecture, not implementation noise.
  3. Dedup cache — SHA-256 content hash, persistent across sessions. Second read = 13-token reference.
  4. JSON pipeline — strip nulls → project out debug fields → flatten → collapse arrays → TOON encoding (lossless compact format)
  5. Safe mode — stack traces, secrets, migrations detected by entropy analysis and routed through with 0% compression

For the full technical details, see docs/.

Configuration

# ~/.sqz/presets/default.toml
[preset]
name = "default"
version = "1.0"

[compression.condense]
enabled = true
max_repeated_lines = 3

[compression.strip_nulls]
enabled = true

[budget]
warning_threshold = 0.70
default_window_size = 200000

Privacy

  • Zero telemetry — no data transmitted, no crash reports
  • Fully offline — works in air-gapped environments
  • All processing local

Development

git clone https://github.com/ojuschugh1/sqz.git
cd sqz
cargo test --workspace
cargo build --release

License

Elastic License 2.0 (ELv2) — use, fork, modify freely. Two restrictions: no competing hosted service, no removing license notices.

Links

Star History

Star History Chart

@kytmanov
Copy link
Copy Markdown

Synto v0.2.0 is out.

https://github.com/kytmanov/synto

Main addition: synto add for PDFs, Markdown, and text files.

That means you can drop a paper or doc into Synto, turn it into clean Obsidian-compatible Markdown, and run the full pipeline on it:
ingest -> compile -> query -> synthesize.

It stays local-first and private:

  • works with local LLMs
  • great with Ollama and LM Studio
  • plain Markdown
  • Obsidian-friendly
  • no vector DB
  • no cloud required

Also includes a lot of hardening from real E2E testing: better PDF import, duplicate detection, source-type prompts, semantic cache, lineage, and smoke fixes.

Star it if you want to support local-first AI tools. Fork it if you want to build on it.

@paulmchen
Copy link
Copy Markdown

The Synthadoc demo is available on YouTube.
https://youtu.be/rIGO6zi9XQE

Synthadoc is an open-source AI knowledge engine that converts scattered documents into reliable domain intelligence.

⭐ Star the project if you find it useful!
GitHub: https://github.com/axoviq-ai/synthadoc

@gschleder
Copy link
Copy Markdown

SciAI Wiki - Scientific Artificial Intelligence Wikis (arXiv manuscript)

Maybe the first manuscript discussing using LLM wiki ideas specifically for science (including content and repositories from this gist thread)

The arXiv-submitted manuscript (May 18th) is here and the repository is just a minimal representative example of it, advanced users can (and possibly, should!) move towards more feature-ready implementations such as those in this thread.

SciAI Wiki is a framework for building persistent scientific memory with Large Language Models, building on Karpathy's LLM Wiki discussions. Instead of treating papers, notes, and research questions as isolated chat sessions, it organises scientific context into human-readable markdown pages (as a wiki) that preserve provenance, relationships, and accumulated reasoning over time.

The repository acts as a structured knowledge substrate for AI-assisted research. Agents can ingest sources, link concepts, answer questions from the accumulated context, audit the knowledge graph, and generate literature syntheses, while researchers remain in control of interpretation, validation, and research direction.

@gowtham0992
Copy link
Copy Markdown

Link v1.2.0 is live

Local, source-backed memory for LLM agents.

Link stores agent memory as plain Markdown on your machine. Raw sources become a source-backed wiki. Explicit “remember this” requests become reviewable agent memory. MCP tools let Codex, Claude, Cursor, Kiro, VS Code, Copilot, and other clients query that memory without dumping the whole wiki into context.

link-memory-dashboard-dark

Install:

brew install gowtham0992/link/link
link demo
link serve link-demo

MCP package:

python3 -m pip install --upgrade link-mcp

What changed in v1.2.0:

  • Homebrew install path.
  • PyPI and MCP Registry updated to 1.2.0.
  • GitHub Pages product docs.
  • Cleaner README and architecture explanation.
  • UI, CLI, and MCP walkthrough visuals.
  • Local /health page for readiness, validation, interrupted writes, and repair commands.
  • link operations for inspecting interrupted or failed writes.
  • Bounded wiki/log.md rotation for active local wikis.
  • Safer generated commands that work from any terminal directory.
  • Stronger first-use, HTTP, MCP stdio, graph, and large-wiki validation.
  • SQLite FTS search and bounded graph/query payloads for larger local wikis.
  • Windows source workflow is covered by CI portability checks.

Links:

GitHub: https://github.com/gowtham0992/link
Docs: https://gowtham0992.github.io/link/
MCP: https://registry.modelcontextprotocol.io/?q=io.github.gowtham0992%2Flink
PyPI: https://pypi.org/project/link-mcp/

@pathakutsav
Copy link
Copy Markdown

pathakutsav commented May 20, 2026

The first time we saw this, we were like, let's build this into an experience and we built MindHub.

If you want to use Knowledge Base with any Anthropic, OpenAI or Gemini model, try it out at https://www.trymindhub.com/

Check us out here-: https://youtu.be/KQaw6H3lQMk

@timfong888
Copy link
Copy Markdown

Built a desktop editor implementation of this idea — nohmitaina. Works with Claude Code or Codex CLI (no API key), local Markdown, macOS.

After a month of feeding it my own notes, three problems showed up that I think most implementations of this pattern will hit:

  1. Identity — The same concept gets extracted under slightly different names from related sources. The wiki ends up with duplicate pages ("Cognitive Dissonance Marketing" and "Cognitive Dissonance and Urgency" from the same book, in my case).
  2. Level — Life-scale themes ("Personal AGI") end up at the same level as tactical findings ("Urgency Trigger"). When everything is flat, importance disappears.
  3. Relationship — Concepts get linked as "related," but the type is lost. Similar, contains, contradicts — all collapsed into one word, which makes the graph useful for navigation but not for thinking.

I did a DDD event-storming pass on the wiki domain and treated each as a first-class domain event (DuplicateCandidateDetected, ConceptsMerged, ConceptRelationshipTyped, ConceptLevelChanged). These run on what I call a Dream cycle — a background pass borrowed from how human memory consolidates during sleep. It also handles the "lint" operation mentioned in the gist.

Found another commenter (Andrii) on X who's solving Level a different way — by extracting citable claims first, then building the concept layer on top of claim collections. The claim approach makes Level fall out structurally (high-claim concepts are heavyweight, low-claim ones are light), which feels more elegant than my event-driven approach. I'm going to try integrating both.

Thanks for the framing — it's already shaped how a small group of us is thinking about this.

I just went to the website, this is interesting, is it just you building this?

@sametbrr
Copy link
Copy Markdown

Implemented this pattern as a full Claude Code skill:
https://github.com/sametbrr/llm-wiki-manager

Install: git clone https://github.com/sametbrr/llm-wiki-manager ~/.claude/skills/llm-wiki-manager

Covers all 8 operating modes (bootstrap, ingest, query, update, lint, multi-wiki routing, and more),
4 idempotent Python scripts for bookkeeping, 8 page templates, and 9 reference docs.
The LLM owns all writing, cross-referencing, and contradiction-flagging — you just drop sources and ask questions.
Works with Claude Code out of the box.

@sametbrr
Copy link
Copy Markdown

Bu pattern'i tam bir Claude Code skill olarak implemente ettim:
https://github.com/sametbrr/llm-wiki-manager

Kurulum: git clone https://github.com/sametbrr/llm-wiki-manager ~/.claude/skills/llm-wiki-manager

8 çalışma modu (bootstrap, ingest, query, update, lint, multi-wiki routing ve daha fazlası),
4 idempotent Python script, 8 sayfa şablonu ve 9 referans doküman içeriyor.
LLM tüm yazmayı, çapraz referanslamayı ve çelişkileri işaretlemeyi üstleniyor — sen sadece kaynakları bırakıp sorular soruyorsun.
Claude Code ile kutudan çıktığı gibi çalışıyor.

@ptgrstrat
Copy link
Copy Markdown

Thank you!!!! where can I buy you coffee!?!?!?

@skyllwt
Copy link
Copy Markdown

skyllwt commented May 21, 2026

ΩmegaWiki(740+⭐) just shipped v1.5.0 — the ideate and experiment module are excellent!
Repo: https://github.com/skyllwt/OmegaWiki

• 26 Claude Code skills, covering the full paper lifecycle:
discover → ingest → graph → ideate → experiment → draft → poster → rebuttal
• 8 typed entities · 20 typed edges
• Bilingual (EN + 中文), every skill ships in both languages
• 4 new skills shipped this month

截图 2026-05-05 12-27-01

Past v1.4.0 highlight: /poster

Paper accepted? Run one command, get a 1400×900 conference poster +
print-quality PNG. Reads your LaTeX source, auto-rasterizes TikZ figures,
renders booktabs tables as live HTML, preserves math via KaTeX, walks you
through figure picks and header config. PDF export from any browser. Section
distillation is grounded in your wiki's paper-plan, ideas, experiments — not a
blind LLM summary of the abstract.

截图 2026-05-18 09-07-17

Try ΩmegaWiki in Claude Code and run the full LLM-Wiki loop:

/discover --venue iclr --year 2026 → ranked reading list
/ingest → typed entry, linked into the graph
/ideate + /novelty → ideas with bibliographic grounding
/exp-design → /exp-run → /exp-eval → pilot-first experiment workflow
/paper-plan → /paper-draft → /paper-compile
/poster → conference poster (NEW)
/rebuttal → reviewer response

End to end. One wiki. No chunks.

If ΩmegaWiki looks interesting, a ⭐ would encourage and motivate us a lot 😀
https://github.com/skyllwt/OmegaWiki

微信图片_20260505122754_295_16 微信图片_20260505122755_296_16

@Electro-resonance
Copy link
Copy Markdown

Really interesting write-up. Thanks for sharing it.

I was directly influenced by your article. I’d previously spent around two years experimenting with custom RAG search layers and recursive use of RAG to create near “infinite context” chatbots, but I wanted to try out your LLM Wiki design pattern more explicitly: the idea of moving from query-time retrieval to a persistent, compounding knowledge artefact.

Rather than only express the workflow as SKILLS.md / agent instructions, I wanted to make the pattern more concrete and callable, so I implemented it as an MCP server:

https://github.com/Electro-resonance/LLM-WIKI-MCP

I then added a CLI so I could test and interact with it directly. The CLI can connect to a configurable local Ollama instance, so once you set your preferred open-weights model, or a larger cloud model via Ollama, you can ingest one or more directories and immediately start chatting with your local files.

One nice recursive detail is that you can run:

/ingest .

so the system ingests its own documentation, then ask the LLM about the CLI and MCP from inside the CLI itself.

The LLM usage is configured to be recursive: each call updates the rolling conversation/context history, with compression when needed, so you can keep chatting with an agent grounded in your own documents. If you use a local model, the whole workflow can run locally.

The main differences from some of the other implementations are:

  • it is an MCP server, not only an agent skill or Obsidian workflow;
  • it has a CLI for testing and direct use;
  • it supports local Ollama/OpenAI-compatible model use;
  • it tracks file provenance using timestamps and hashes;
  • repeated ingest skips unchanged files and updates changed ones;
  • it supports single-file and directory ingest;
  • the wiki can ingest its own docs and answer questions about itself;
  • recursive ask/history gives a more persistent local chat experience over your own knowledge base.

To me the big shift is exactly the one you described: RAG retrieves information, but an LLM-maintained wiki accumulates understanding.

The MCP can also be used directly within an agentic workflow, so other agents can call the wiki tools programmatically rather than going through the CLI.

Enjoy, and thanks for the inspiration.

@dtmoura
Copy link
Copy Markdown

dtmoura commented May 21, 2026

Preventing the wiki from going stale and ensuring the agents are proactive in adding and maintaining the wiki during the conversation, without a precise prompt, seems to be the problem. Just putting a "remember to update..." in claude.md wont do it.

@paulmchen
Copy link
Copy Markdown

Synthadoc v0.5.0 is released.

Two new features that close the epistemic gap:

  1. Adversarial Review: a second independent LLM interrogates every compiled page for overreach, unsupported generalisations, and contradictions. You appoint the judge in config.toml (different provider or model family recommended for independence). Warnings surface in a lint tab, in page frontmatter, and in the audit trail.
    → Quick-start demo - Step 9 (https://github.com/axoviq-ai/synthadoc/blob/main/docs/user-quick-start-guide.md#adversarial-review)

  2. Claim-Level Provenance: during ingest, every compiled paragraph receives a ^[filename:L–L] citation marker tracing it back to the exact source lines. In Obsidian, markers render as chips; one click opens a Source Viewer (with PDF page resolution). Full provenance is queryable via CLI and HTTP API.
    → Quick-start demo - Step 18 (https://github.com/axoviq-ai/synthadoc/blob/main/docs/user-quick-start-guide.md#claim-provenance)

  3. Also shipping in v0.5.0: routing + alias system, candidates staging, and a redesigned Obsidian plugin.

If any of this is useful or you have thoughts on the direction, feedback is very welcome — and a ⭐ always helps the project reach more people.

👉 https://github.com/axoviq-ai/synthadoc

@ojuschugh1
Copy link
Copy Markdown

  ███████╗ ██████╗ ███████╗
  ██╔════╝██╔═══██╗╚══███╔╝
  ███████╗██║   ██║  ███╔╝
  ╚════██║██║▄▄ ██║ ███╔╝
  ███████║╚██████╔╝███████╗
  ╚══════╝ ╚══▀▀═╝ ╚══════╝
  

Compress LLM context to save tokens and reduce costs

Real session stats: 3,003 compressions · 178,442 tokens saved · 24.7% avg reduction · up to 92% with dedup

Featured

Crates.io npm PyPI VS Code Firefox JetBrains Discord Homebrew

Install · How It Works · Supported Tools · Changelog · Discord


sqz compresses command output before it reaches your LLM. Single Rust binary, zero config.

The real win is dedup: when the same file gets read 5 times in a session, sqz sends it once and returns a 13-token reference for every repeat.

Without sqz:                    With sqz:

File read #1:  2,000 tokens     File read #1:  ~800 tokens (compressed)
File read #2:  2,000 tokens     File read #2:  ~13 tokens  (dedup ref)
File read #3:  2,000 tokens     File read #3:  ~13 tokens  (dedup ref)
───────────────────────         ───────────────────────
Total:         6,000 tokens     Total:         ~826 tokens (86% saved)

Token Savings

24.7% average reduction across 3,003 real compressions ·
92% saved on repeated file reads ·
86% on shell/git output ·
13-token refs for cached content

One developer's week, measured from actual sqz gain output:

$ sqz gain
sqz token savings (last 7 days)
──────────────────────────────────────────────────
  04-13 │                              │   2,329 saved
  04-14 │                              │       0 saved
  04-15 │███                           │  12,954 saved
  04-16 │██                            │   9,223 saved
  04-17 │████                          │  14,752 saved
  04-18 │██████████████████████████████│ 105,569 saved
  04-19 │████████                      │  30,882 saved
  04-20 │█                             │   4,334 saved
──────────────────────────────────────────────────
  Total: 3,003 compressions, 178,442 tokens saved (24.7% avg reduction)

Per-command compression

Single-command compression (measured via cargo test -p sqz-engine benchmarks):

Content Before After Saved
Repeated log lines 148 62 58%
Large JSON array 259 142 45%
JSON API response 64 53 17%
Git diff 61 54 12%
Prose/docs 124 121 2%
Stack trace (safe mode) 82 82 0%

Session-level with dedup

Where the real savings live — the cache sends each file once, repeats cost 13 tokens:

Scenario Without sqz With sqz Saved
Same file read 5× 10,000 826 92%
Same JSON response 3× 192 79 59%
Test-fix-test cycle (3 runs) 15,000 5,186 65%

Single-command compression ranges from 2–58% depending on content. Repeated reads drop to 13 tokens each. Your mileage will vary with how repetitive your tool calls are — agentic sessions with many file re-reads see the biggest wins.

Install

Prebuilt binaries (no compiler required — works on every platform):

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/ojuschugh1/sqz/main/install.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/ojuschugh1/sqz/main/install.ps1 | iex

# Any platform via npm
npm install -g sqz-cli

# macOS / Linux via Homebrew
brew tap ojuschugh1/sqz
brew install sqz

Build from source via Cargo:

cargo install sqz-cli sqz-mcp

sqz-cli provides the sqz binary; sqz-mcp provides the MCP server. sqz-engine is a library dependency — it compiles automatically and does not need to be installed separately.

Build from source (cargo install sqz-cli) works too, but needs a C toolchain:

  • Linux: build-essential (apt) or equivalent
  • macOS: Xcode Command Line Tools (xcode-select --install)
  • Windows: Visual Studio Build Tools with the "Desktop development with C++" workload. Without these, cargo install fails with linker link.exe not found. If you don't already have them, use the PowerShell or npm install above instead.

Then initialize:

sqz init --global     # hooks apply to every project on this machine
# or
sqz init              # hooks apply to just this project (.claude/settings.local.json)

--global writes to ~/.claude/settings.json (the user scope per the
Anthropic scope table),
so the sqz hook fires in every Claude Code session on this machine. This is
the common case on first install. Your existing permissions, env,
statusLine, and unrelated hooks in ~/.claude/settings.json are
preserved — sqz merges its entries rather than overwriting.

Plain sqz init (project scope) is useful when you want sqz active only
inside one repo.

Only using one agent? Pass --only (or --skip) to limit which
configs are written:

sqz init --only opencode              # just OpenCode, nothing else
sqz init --only opencode,codex        # OpenCode and Codex
sqz init --skip cursor,windsurf       # everything except Cursor and Windsurf

Accepted names: claude, cursor, windsurf, cline, gemini,
kiro, opencode, codex. Aliases (claude-code, gemini-cli, roo,
kiro-cli) also work. --only and --skip can't be combined.

Manual installation (preserve comments in your config)

sqz init round-trips your config file through a JSON parser to merge
the sqz entry, which drops any comments in your opencode.jsonc (and
the analogous JSON-with-comments files other tools accept). If you've
commented your config carefully and want to keep them, install by hand
instead.

OpenCode — two steps:

  1. Drop the plugin file in place. sqz prints the generated TS to
    stdout so you don't have to hand-write the path-escaping logic:

    mkdir -p ~/.config/opencode/plugins
    sqz print-opencode-plugin > ~/.config/opencode/plugins/sqz.ts
  2. Add the MCP entry to your existing opencode.jsonc yourself.
    Append this block inside the top-level mcp object (create the
    mcp object if it doesn't exist):

    "sqz": {
      "type": "local",
      "command": ["sqz-mcp", "--transport", "stdio"],
      "enabled": true
    }

Comments in the rest of your file stay put. OpenCode auto-discovers
the plugin file; no plugin array entry needed (adding one causes
double-loading, see issue #10).

Other tools — Claude Code, Cursor, Windsurf, Cline, Gemini CLI,
and Codex use plain JSON configs without comment support, so the
automated path is non-destructive there. Use sqz init --only <tool>
for those.

That's it. Shell hooks installed, AI tool hooks configured.

How It Works

sqz system architecture

sqz installs a PreToolUse hook that intercepts bash commands before your AI tool runs them. The output gets compressed transparently — the AI tool never knows.

Claude → git status → [sqz hook rewrites] → compressed output (85% smaller)

What gets compressed:

  • Shell output — git, cargo, npm, docker, kubectl, ls, grep, etc.
  • JSON — strips nulls, compact encoding
  • Logs — collapses repeated lines
  • Test output — shows failures only

What doesn't get compressed:

  • Stack traces, error messages, secrets — routed to safe mode (0% compression)
  • Your prompts and the AI's responses — controlled by the AI tool, not sqz

Supported Tools

Tool Integration Setup
Claude Code PreToolUse hook (transparent) sqz init
Cursor PreToolUse hook (transparent) sqz init
Windsurf PreToolUse hook (transparent) sqz init
Cline PreToolUse hook (transparent) sqz init
Gemini CLI BeforeTool hook (transparent) sqz init
Kiro PreToolUse hook (transparent) sqz init
OpenCode TypeScript plugin (transparent) sqz init
VS Code Extension Install from Marketplace
JetBrains Plugin Install from Marketplace
Chrome Browser extension ChatGPT, Claude.ai, Gemini, Grok, Perplexity
Firefox Browser extension Same sites

CLI

sqz init --global             # Install hooks for every project on this machine
sqz init                      # Install hooks for just this project
sqz init --only kiro          # Only configure Kiro (skip the rest)
sqz init --only opencode      # Only configure OpenCode (skip the rest)
sqz init --skip cursor        # Configure every agent except Cursor
sqz compress <text>           # Compress (or pipe from stdin)
sqz compress --no-cache       # Compress without dedup (always full output)
sqz expand <ref>              # Recover original content from a §ref:HASH§ token
sqz compact                   # Evict stale context to free tokens
sqz gain                      # Show daily token savings (bar chart)
sqz gain --project .          # Per-project daily gains
sqz gain --days 30            # Last 30 days
sqz stats                     # Cumulative compression report
sqz stats --breakdown         # Per-command token usage breakdown
sqz stats --project .         # Stats for current project only
sqz stats --project list      # List all tracked projects
sqz discover                  # Find missed savings
sqz resume                    # Re-inject session context after compaction
sqz vizit                     # Live terminal dashboard (like htop for AI agents)
sqz hook claude               # Process a PreToolUse hook (Claude Code)
sqz hook kiro                 # Process a PreToolUse hook (Kiro)
sqz print-opencode-plugin     # Print OpenCode plugin TS for manual install
sqz proxy --port 8080         # API proxy (compresses full request payloads)

Dedup Escape Hatch

When sqz sees the same content twice, it returns a compact §ref:HASH§ token
instead of the full text. Most models handle this fine, but some (e.g., GLM 5.1)
can't parse the ref format and loop. Four ways to work around this:

# 1. Recover original content from a ref
sqz expand a1b2c3d4              # prefix match
sqz expand '§ref:a1b2c3d4§'     # paste the whole token

# 2. Compress without dedup (per-invocation)
echo "..." | sqz compress --no-cache

# 3. Disable dedup globally (env var)
export SQZ_NO_DEDUP=1

# 4. MCP passthrough tool (returns input byte-exact, zero transforms)
# Available via tools/list when sqz-mcp is running

Track Your Own Savings

Run sqz gain in your shell any time to see your own daily breakdown (see the
Token Savings section above for what the output looks like), and sqz stats
for the full cumulative report:

$ sqz stats
  📊 sqz compression stats
  ──────────────────────────────────────────────────

  178,442  tokens saved
  ↓  24.7% average reduction

  Compressions           3,003
  Tokens in              721,840
  Tokens out             543,398
  Tokens saved           178,442
  Avg reduction          24.7%

  🗄️  Cache
  ──────────────────────────────────────────────────
  Entries                43
  Size                   39.1 KB

Add --breakdown to see exactly which commands consume the most tokens:

$ sqz stats --breakdown

  🔍 Top Token Consumers
  ──────────────────────────────────────────────────────────────────────
  command               calls  tokens in        out    saved
  ──────────────────────────────────────────────────────────────────────
  dedup                   249      45541       3237      93%
  stdin                    51      30851      24289      21%
  auto                    132      18288       7740      58%
  echo                     17       1050        558      47%
  ls -la                    8        948        948       0%
  cargo build               7        170        145      15%
  git status                4         56          8      86%
  ──────────────────────────────────────────────────────────────────────

Per-project filtering:

sqz stats --project .           # stats for current project only
sqz stats --project list        # list all tracked projects
sqz gain --project .            # daily gains for current project
sqz gain --days 30              # last 30 days instead of 7
sqz gain --days 30 --project .  # combine both

Stats are stored locally in SQLite under ~/.sqz/sessions.db — nothing leaves your machine.

How Compression Works

  1. Per-command formattersgit status → compact summary, cargo test → failures only, docker ps → name/image/status table
  2. Structural summaries — code files compressed to imports + function signatures + call graph (~70% reduction). The model sees the architecture, not implementation noise.
  3. Dedup cache — SHA-256 content hash, persistent across sessions. Second read = 13-token reference.
  4. JSON pipeline — strip nulls → project out debug fields → flatten → collapse arrays → TOON encoding (lossless compact format)
  5. Safe mode — stack traces, secrets, migrations detected by entropy analysis and routed through with 0% compression

For the full technical details, see docs/.

Configuration

# ~/.sqz/presets/default.toml
[preset]
name = "default"
version = "1.0"

[compression.condense]
enabled = true
max_repeated_lines = 3

[compression.strip_nulls]
enabled = true

[budget]
warning_threshold = 0.70
default_window_size = 200000

Privacy

  • Zero telemetry — no data transmitted, no crash reports
  • Fully offline — works in air-gapped environments
  • All processing local

Development

git clone https://github.com/ojuschugh1/sqz.git
cd sqz
cargo test --workspace
cargo build --release

License

Elastic License 2.0 (ELv2) — use, fork, modify freely. Two restrictions: no competing hosted service, no removing license notices.

Links

Star History

Star History Chart

@ahumanft
Copy link
Copy Markdown

Built this as the next handoff in the chain. V3 proposes segmentation as the architectural answer to scaling V1 — the idea that every component of a wiki system, ingestion, schemas, roles, retrieval, lint, should stay narrow enough that the LLM executing it never gets overwhelmed. Also includes a finding on explicit versus implicit schema instructions and why timing matters when invoking them. Intentionally incomplete, same spirit as V1 and V2. https://gist.github.com/ahumanft/6c96385be6ca4af578cc9b20e0f79e66

@rstoye
Copy link
Copy Markdown

rstoye commented May 23, 2026

Very curious how many of these comments are from bots. There can't be this many people that think an LLM wiki is good solution for knowledge at scale.

I've seen a few voices of reason here, but most everything is "What a great and novel idea". Which is wrong on both accounts. Folks - for your own sake, please research information retrieval and storage to understand why this doesn't work.

If you're still reading... Check out CIBFE or https://headkey.ai for a pluggable cognitive solution that's more than memory.

So you shit on the idea, critizice other people for believing in it, don't provide any explanation, tell people to do their own research and then you link to a website of a product that is currently gated behind a waitlist, closed source and created by a company you work for.

Man a lot to unpack here.

@gowtham0992
Copy link
Copy Markdown

Link v1.3.0 is live.

Link is local, source-backed memory for AI agents. Raw sources become an inspectable Markdown wiki. Explicit “remember this” requests become reviewable agent memory. MCP tools let Codex, Claude, Cursor, Kiro, VS Code, Antigravity, and other clients query that memory without dumping the whole wiki into context.

image

Install:

brew install gowtham0992/link/link
link demo
link serve link-demo

MCP:

python3 -m pip install --upgrade link-mcp

What changed:

- link next for first-run prompts.
- link health and /api/health for readiness checks.
- MCP link_operations for interrupted-write inspection.
- Better graph controls for larger wikis.
- Persistent-cache diagnostics in status/health/benchmark.
- More copyable prompts and recovery actions in the local UI.
- Paste-safe CLI commands that work from any terminal directory.
- Threaded local HTTP handling and request timeouts.
- Expanded user-level smoke coverage for CLI, HTTP viewer, MCP stdio, and large wikis.

Links:

GitHub: https://github.com/gowtham0992/link
Docs: https://gowtham0992.github.io/link/
MCP Registry: https://registry.modelcontextprotocol.io/?q=io.github.gowtham0992%2Flink
PyPI: https://pypi.org/project/link-mcp/

@andkir1
Copy link
Copy Markdown

andkir1 commented May 24, 2026

I've been building a search-behavior-grounded knowledge tree since 2012 — synonym clustering plus hierarchical navigation built from actual query data, not academic taxonomy. Your maintenance insight is exactly right, and it reframes something I couldn't solve with human contributors alone. One question your bottom-up approach leaves open: what if the LLM also had a pre-built cognitive skeleton showing not just what exists but what's missing and in what order those gaps matter?

Wrote about this here: https://medium.com/@thebooq/the-right-problem-89624f7a461a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment