karpathy/llm-wiki.md

Created April 4, 2026 16:25

Star (5,000+) You must be signed in to star a gist
Fork (4,392) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f.js"></script>
Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.

Download ZIP

llm-wiki

Raw

llm-wiki.md

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.

This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.

You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

This can apply to a lot of different contexts. A few examples:

Personal: tracking your own goals, health, psychology, self-improvement — filing journal entries, articles, podcast notes, and building up a structured picture of yourself over time.
Research: going deep on a topic over weeks or months — reading papers, articles, reports, and incrementally building a comprehensive wiki with an evolving thesis.
Reading a book: filing each chapter as you go, building out pages for characters, themes, plot threads, and how they connect. By the end you have a rich companion wiki. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages covering characters, places, events, languages, built by a community of volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing and maintenance.
Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. Possibly with humans in the loop reviewing updates. The wiki stays current because the LLM does the maintenance that no one on the team wants to do.
Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.

Architecture

There are three layers:

Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

Operations

Ingest. You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. Personally I prefer to ingest sources one at a time and stay involved — I read the summaries, check the updates, and guide the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. It's up to you to develop the workflow that fits your style and document it in the schema for future sessions.

Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms depending on the question — a markdown page, a comparison table, a slide deck (Marp), a chart (matplotlib), a canvas. The important insight: good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.

Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. The LLM is good at suggesting new questions to investigate and new sources to look for. This keeps the wiki healthy as it grows.

Indexing and logging

Two special files help the LLM (and you) navigate the wiki as it grows. They serve different purposes:

index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count. Organized by category (entities, concepts, sources, etc.). The LLM updates it on every ingest. When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.

log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title), the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log gives you a timeline of the wiki's evolution and helps the LLM understand what's been done recently.

Optional: CLI tools

At some point you may want to build small tools that help the LLM operate on the wiki more efficiently. A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option: it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It has both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you vibe-code a naive search script as the need arises.

Tips and tricks

Obsidian Web Clipper is a browser extension that converts web articles to markdown. Very useful for quickly getting sources into your raw collection.
Download images locally. In Obsidian Settings → Files and links, set "Attachment folder path" to a fixed directory (e.g. raw/assets/). Then in Settings → Hotkeys, search for "Download" to find "Download attachments for current file" and bind it to a hotkey (e.g. Ctrl+Shift+D). After clipping an article, hit the hotkey and all images get downloaded to local disk. This is optional but useful — it lets the LLM view and reference images directly instead of relying on URLs that may break. Note that LLMs can't natively read markdown with inline images in one pass — the workaround is to have the LLM read the text first, then view some or all of the referenced images separately to gain additional context. It's a bit clunky but works well enough.
Obsidian's graph view is the best way to see the shape of your wiki — what's connected to what, which pages are hubs, which are orphans.
Marp is a markdown-based slide deck format. Obsidian has a plugin for it. Useful for generating presentations directly from wiki content.
Dataview is an Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free.

Why this works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

The idea is related in spirit to Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's vision was closer to this than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part he couldn't solve was who does the maintenance. The LLM handles that.

Note

This document is intentionally abstract. It describes the idea, not a specific implementation. The exact directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain, your preferences, and your LLM of choice. Everything mentioned above is optional and modular — pick what's useful, ignore what isn't. For example: your sources might be text-only, so you don't need image handling at all. Your wiki might be small enough that the index file is all you need, no search engine required. You might not care about slide decks and just want markdown pages. You might want a completely different set of output formats. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest.

LangSensei commented Apr 13, 2026

Love this. The idea of LLMs maintaining persistent structured artifacts instead of re-deriving everything from scratch really resonated. It inspired me to think about the analogous problem in the agent harness space — not knowledge accumulation, but task execution.

I've been working on LLM agent harnesses (Copilot CLI, Claude Code, Codex, etc.) and ran into a recurring problem: agents drift during long tasks. They forget their plan, skip steps, redo work. The context window is a sliding window of amnesia.

Inspired by this wiki pattern, I wrote up two complementary ideas from the harness perspective:

1. Cognitive Scaffolding for Autonomous Agents — externalize the agent's reasoning into files (plan, findings, progress). Writing is thinking. Re-reading is remembering. Add hooks that force the agent to update and re-read its files periodically — automated discipline. Same core insight as your wiki: persistent files > ephemeral context, but applied to within-task reasoning rather than cross-source knowledge.

→ https://gist.github.com/LangSensei/ffece86d696948ef739e42233642141a

2. Dumb Routers, Smart Specialists — for multi-agent execution, separate judgment from execution. The dispatcher makes one LLM call (classify to a specialist), then hands off to deterministic code. Deep thinking happens inside domain-scoped specialists with their own tools, methodology, and knowledge. Isolation prevents context pollution; expertise becomes portable and shareable.

→ https://gist.github.com/LangSensei/c954f8654ef025816300fdfb2f7ba860

Thanks for putting this out there — it crystallized a lot of things I'd been thinking about.

KarabutRom commented Apr 13, 2026

I'm total noob. I've startet 2 weeks ago. Been running this pattern for Claude Code session persistence. A few things that actually matter in practice:

Architecture

Three layers:

MEMORY.md — pure index, one line per entry (~150 chars max). This is all that loads automatically.
Typed files — user_.md, feedback_.md, project_.md, reference_.md. Read on demand.
Schema in CLAUDE.md — when to write, how to update, what each type means.

Why typed files

The type in the filename does real work. feedback_ = apply to future behavior. project_ = expect staleness. The agent routes without extra prompting because the convention is in the name, not in the context.

The compaction problem

Claude Code compacts mid-session. Whatever exceeds the context budget gets deprioritized silently — rules you set at session start can just... stop applying.

Fix: keep the index surgically small. Full content lives in separate files, pulled only when relevant. Index survives compaction; a 200-line MEMORY.md doesn't.

What I skipped

No vector DB, no BM25. At personal-project scale, structured naming + LLM intent outperforms retrieval infrastructure — and you can open, edit, and git-diff everything in a text editor.

johnsamuelwrites commented Apr 13, 2026

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

I love this framing because it finally makes LLMs feel personal instead of generic tools. The moment you treat the model as the engine behind your own evolving wiki/second brain, that “curate sources, direct the analysis, ask good questions” job description becomes a description of your identity in the loop, not just a usage tip.

The LLM isn’t a chatbot anymore, it’s the invisible infrastructure doing all the boring bookkeeping so that your time is spent on taste, judgment, and long‑term sense‑making.

n7-ved commented Apr 13, 2026

This pattern resonates. We've been building something close for ~6 months in a different domain, and reading this was uncanny. A few things we ended up doing that might be worth sharing:

Enforcement works best at the agent boundary, not the conversation boundary; Rather than trying to block the main conversation from editing the wiki, we let each specialised agent be its own enforcement unit. The writer agent's frontmatter excludes Bash and web; a PreToolUse hook on it blocks writes to any path outside the four content layers. The maintainer agent has Bash, but a PreToolUse hook validates every command (no rm -rf, no force-push, etc.). The auditor is read-only. The main conversation's write discipline is instructional, it's trusted to respect the rule in CLAUDE.md because it's the "planner," not the "executor." Hooks do the heavy lifting on the executors. This gives you structural guarantees on the agents that actually mutate things, without the friction of locking the conversation itself.

Binary verified/unverified isn't enough; you need to split "inferred" from "unsourced." We shipped four claim types as Obsidian callouts: Source (verbatim quote with citation), Analysis (our inference from sourced facts, with reasoning shown), Unverified (no authoritative source yet), Gap (explicitly missing, never fill with a plausible guess). The Analysis / Unverified split is the one that earned its keep. It prevents paraphrasing-bias, where the model rewrites what a source says and nobody can tell afterwards whether it got it right.

Staleness can be mechanical; Each file carries a score derived from how far behind its outgoing wiki-link dependencies it is. Forward-only, no backlink tracking. Update a source, every downstream file's score ticks up, the auditor surfaces the worst offenders. Replaces a lot of the "who might have stale claims about this?" review burden that otherwise falls back on humans.

One structural divergence from your sketch: three layers wasn't enough for us. We added a fourth - an infrastructure layer with design records for the agents, rules, hooks, and conventions themselves. Schema-in-CLAUDE.md works until the schema has non-trivial rationale worth preserving across changes. Then it wants its own records.

We are still learning and evolving in this journey, so thanks for writing it up.

gnusupport commented Apr 13, 2026 via email

* Josh Wand ***@***.***> [2026-04-13 03:15]:

@joshwand commented on this gist: @gnusupport it makes it really hard to take any of the comments seriously if I feel like I'm talking to a modern version of ELIZA (with some self promotion thrown in—50 out of the 435 current comments are plugging their own projects).

Hey, I hear you, but honestly — protesting that some comments feel like ELIZA in 2026 is like complaining that people use spellcheck instead of quill pens. Times changed. Tech changed. Communities split and multiplied. The thread was about LLMs in wikis, not about catering to anyone’s nostalgia for “pure” human conversation. If someone uses a tool to clarify their thoughts before posting, that’s their call. You don’t have to like it, but pretending it invalidates the whole discussion? That’s on you, not on us.

mauceri commented Apr 13, 2026 via email

And Le lun. 13 avr. 2026, 08:01, John Samuel ***@***.***> a écrit :

***@***.**** commented on this gist. ------------------------------ The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else. I love this framing because it finally makes LLMs feel personal instead of generic tools. The moment you treat the model as the engine behind your own evolving wiki/second brain, that “curate sources, direct the analysis, ask good questions” job description becomes a description of your identity in the loop, not just a usage tip. The LLM isn’t a chatbot anymore, it’s the invisible infrastructure doing all the boring bookkeeping so that your time is spent on taste, judgment, and long‑term sense‑making. — Reply to this email directly, view it on GitHub <https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f#gistcomment-6095261> or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHXAPYOM6VJ3M4ACLMZERL4VR7B7BFHORZGSZ3HMVZKMY3SMVQXIZNMON2WE2TFMN2F65DZOBS2WR3JON2EG33NNVSW45FGORXXA2LDOOIYFJDUPFYGLJDHNFZXJJLWMFWHKZNJGE2DOMRVHAYDKMFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJZYGI3TKMJSGGSG4YLNMWUGCY3UN5ZF62LEQKSXMYLMOVS2I5DSOVS2I3TBNVS3W5DIOJSWCZC7OBQXE5DJMNUXAYLOORPWCY3UNF3GS5DZQKSXMYLMOVS2IZ3JON2KI3TBNVS2W5DIOJSWCZC7OR4XAZI> . You are receiving this email because you are subscribed to this thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

@joshwand These comments might simply have been rewritten by a bot. Don’t you ever use prompts like, “Can you rewrite this text more concisely and in this language?” It’s not much different from using a spell-checker; it’s a natural use of AI—so what’s the problem with that? Wouldn’t it have been better to do that, rather than pasting that tedious list of regular expressions...

gnusupport commented Apr 13, 2026 via email

* Weitong Qian ***@***.***> [2026-04-13 05:06]:

@skyllwt commented on this gist: Hey @karpathy — your LLM-Wiki idea really resonated with us. We're a team from Peking University working on AI/CS research. We didn't just build a wiki — we plugged it into the entire research pipeline as the central hub that every step revolves around. The result is ΩmegaWiki: your LLM-Wiki concept extended into a full-lifecycle research platform. If you find it useful, a ⭐ would mean a lot! PRs, issues, and ideas all welcome — let's build this together. https://github.com/skyllwt/OmegaWiki

"Karpathy's LLM-Wiki Vision" sounds like licking his ass. Is there something unique, and your own creativity there? Why always follow the "standards" like even using "Markdown". Why not Asciidoctor, Kotl, Org, Jemdoc, reStructuredTet, txt2tags, Emacs Enriched mode, Djot, Wikitext, XML, Graphviz, use anything! The link you are referencing https://x.com/karpathy/status/1909372692069236775 isn't even there. Are you maybe supporting the "authority" which is not -- which doesn't even support it's own links?

gnusupport commented Apr 13, 2026 via email

* Nathan ***@***.***> [2026-04-13 05:16]:

@NorseGaud commented on this gist: > @gnusupport it makes it really hard to take any of the comments seriously if I feel like I'm talking to a modern version of ELIZA (with some self promotion thrown in—50 out of the 435 current comments are plugging their own projects). Bro, exactly. Dead internet theory in action.

You call it “dead internet theory in action,” but the internet is more alive than ever — just not in the narrow, purist way you seem to miss. More people, more tools, more noise, more signal. Just because some of that signal gets polished by an LLM doesn’t mean the conversation is dead. It means you don’t like the new texture.

FBoschman commented Apr 13, 2026

Runs like a breeze. I have been working with an LLM and with obsidian for a while. I do research on educational sciences and I noticed that my obsidian gets cluttered. THis workflow and the WIKI structure have helped me a lot. I expanded on the idea of taking fleeting notes through the so called FUNGI protocol. It is an additition to the note taking that both helps the LLM think alongside my own critical thinking and is based on the simple premise that our own minds (even as scientists) are biased and should be questioned.

Also, when ingested or added in the workflow, it works like a charm flagging notes that have not yet fully grown, need work or where interesting tensions arise. Feel free to use, comment and work on.

Here is the addition:

Framework: Fleeting → Concept Notes

A structure for turning raw notes into concept notes, built around ethical AI principles and a mycelial learning paradigm (decentralised, interconnected, slow-growing, nutrient-sharing across ideas).

The FUNGI Framework

A five-stage pass for each fleeting note. Use it as a template — not every field needs filling on the first pass.

Stage	Prompt	Purpose
F — Frame	What is the raw note actually saying? Restate in one sentence.	Strips ambiguity before interpretation.
U — Unearth	What assumptions, sources, or prior ideas is it feeding on?	Surfaces the substrate.
N — Network	Which existing concept notes, authors, or frameworks does it connect to? Name at least two.	Builds hyphal links.
G — Grow	What new question, tension, or claim does it produce?	Forces generative output, not just storage.
I — Interrogate	What's the strongest counter-argument? What would falsify it? Confidence: high / medium / low.	Ethical check — resists premature certainty.

Concept Note Template

Title: [claim-shaped, not topic-shaped]
Date:
Status: seedling / developing / mature

Claim (one sentence)
Frame (from fleeting note)
Substrate (sources, APA)
Connections (≥2 existing notes/concepts)
Generative question
Counter-argument
Confidence: H / M / L
Open threads

Ethical AI Guardrails

When I help you process notes, I'll follow these rules — push back if I drift:

No synthesis without attribution. If I merge your idea with a source, I name the source.
No smoothing. I preserve contradictions in your notes rather than resolving them for neatness.
Challenge by default. Every concept note gets at least one counter-argument from me, even if you disagree.
Confidence flagged. I'll mark my own contributions H/M/L so you can see where I'm guessing.
You own the claim. I propose; you decide what becomes a concept note.

Mycelial Principles in Practice

Decentralised — no single note is the "main" one. Links matter more than hierarchy.
Nutrient-sharing — a note earns its place by feeding at least two others.
Slow growth — seedling notes can sit unresolved; not everything needs closure.
Decomposition — old notes can be broken down and reabsorbed into new ones. Nothing is wasted, nothing is sacred.

How to Use This With Me

Paste a fleeting note (or several).
Tell me the status you want: quick pass (Frame + Grow only) or full FUNGI.
I'll return a draft concept note plus at least one challenge or tension I see.
You push back, edit, or bin it.

Pros and Cons of This Approach

Pros

Forces generative output, not just filing.
Counter-argument step resists confirmation bias.
Mycelial linking builds a web that compounds over time.
Ethical guardrails keep my role transparent.

Cons

Slower than freeform note-taking.
Five stages can feel heavy for small notes — hence the "quick pass" option.
Relies on you maintaining the link network; I can suggest but not enforce it.
Confidence ratings are my own estimates and can be wrong.

Confidence in the framework itself: medium. It's a synthesis of zettelkasten practice, ecological metaphor, and AI-ethics norms — untested on your specific workflow. I'd expect to revise it after the first 5–10 notes.

Ready when you are — paste the first fleeting note and tell me quick pass or full FUNGI.

sheldon123z commented Apr 13, 2026

99% of comments are made by AI, I really don't know the value for reading these comments and ads, long and unreadable, good lood but no help, I call them trash.

Please don't post any ads, the true valuable things are thoughts.

$@freakyfractal$

freakyfractal commented Apr 13, 2026

There's a lightweight version of this that's worth mentioning: skip the filesystem/harness entirely and piggyback off a conversation with any memory-enabled LLM provider as the wiki.

Seed a chat with something like:

Build a knowledge graph from everything you know about me.
Nodes with types, short notes, tags. Edges with verb labels.
Force-directed graph UI. Click to explore, search, filter.
Persist in-session. I evolve it by talking: "add X",
"connect X to Y", "what's related to Z". You update the artifact.

If your LLM provider has artifacts/canvas, you get a visual explorer for free. If it has memory, it seeds from your history. The LLM is simultaneously the database, the search engine, and the renderer. Zero infra, works in any chat window.

The obvious limitation is context window degradation - you hit a ceiling Karpathy's filesystem approach doesn't have. But you also skip the entire setup and maintenance costs. When the conversation gets long and unreliable, you maybe ask the LLM to compress the current state back into a new seed prompt and start fresh.

Different tradeoff, not a replacement. This optimizes for thinking-in-the-moment over durable accumulation. So not a second brain, but a directable interface into your memory.

akshayram1 commented Apr 13, 2026

I think

PageIndex + LLM Wiki combines smart retrieval with persistent learning. PageIndex handles per-query reasoning by navigating documents as a structured tree, avoiding inefficient chunk-based retrieval. LLM Wiki adds a server-side memory layer that stores distilled, reusable knowledge from past queries. Instead of recomputing answers every time, the system first checks the wiki and only falls back to PageIndex when needed. Over time, this acts like a semantic cache, reducing context size, repeated LLM calls, and token usage. With selective updates, async writes, and smaller models for wiki generation, the system becomes cheaper and faster at scale, while continuously improving answer quality.

🏗️ Simple Architecture Diagram

          Client (MCP - Stateless)
                     ↓
              ┌──────────────┐
              │  API Server  │
              └──────┬───────┘
                     ↓
            ┌──────────────────┐
            │  Orchestrator    │
            └──────┬───────────┘
                   ↓
     ┌─────────────┴─────────────┐
     ↓                           ↓
┌──────────────┐        ┌────────────────┐
│  LLM Wiki    │        │   PageIndex    │
│ (Memory)     │        │ (Retrieval)    │
└──────┬───────┘        └──────┬─────────┘
       ↓                        ↓
        ─────── Merge Context ───────
                     ↓
              ┌──────────────┐
              │     LLM      │
              │ (Answer Gen) │
              └──────┬───────┘
                     ↓
               Response to Client

        (Async)
           ↓
   ┌──────────────────────┐
   │ Wiki Update (cheap)  │
   └──────────────────────┘

SonicBotMan commented Apr 13, 2026 •

edited

Loading

We've been building wiki-kb (https://github.com/SonicBotMan/wiki-kb), a system based on this exact pattern from Karpathy's gist — "compiling vs retrieving." The gist describes the idea well, but we found the hard part isn't the initial build, it's preventing degradation over months of daily use. Here's what we added on top:

Architecture: 3 layers instead of 2

Karpathy describes raw sources → wiki. We added a third layer in between: schema. Each wiki page has YAML frontmatter with typed fields (lists, dates, entity references, status). A resolver.py validates every write before it hits the filesystem. This catches most "lazy LLM" problems (empty fields, wrong types, broken cross-references) before they compound.

Entity Registry — the graph backbone

A JSON registry (with file locking) tracks every entity (people, concepts, projects, events) with canonical names and aliases. When the LLM tries to create a duplicate entity with a slightly different name, the registry catches it and merges. This is what prevents the wiki from turning into 50 pages about the same thing with slightly different titles — one of the first failure modes we hit.

Periodic lint cycle

After any wiki update, a verification pass checks: does every entity referenced in frontmatter actually exist? Are cross-references bidirectional? Does the graph remain connected? This runs automatically and flags issues before they cascade.

On the model collapse concern

This is real — we've seen it happen when the LLM starts rewriting existing pages instead of adding new information. Our mitigation is structural: the typed frontmatter and entity registry provide "hard rails" that are harder to corrupt than freeform prose. The wiki can drift in narrative quality, but the structural invariants (entity relationships, bidirectional links, graph topology) remain verifiable programmatically.

MCP-based automation

The whole system runs as an MCP server, so any LLM agent (Hermes, Claude Code, etc.) can read/write the wiki through a standardized tool interface. A semantic search index (OpenViking) sits alongside the wiki for retrieval-augmented queries when the compiled knowledge isn't enough.

The key takeaway from our experience: don't just let the LLM write freely and hope for the best. Enforce structural invariants at the schema layer, and the wiki stays useful much longer. We've been running this daily for several months now and the quality has held up well.

gnusupport commented Apr 13, 2026

99% of comments are made by AI, I really don't know the value for reading these comments and ads, long and unreadable, good lood but no help, I call them trash.

What an irony that in the discussion thread referencing LLM/AI you are protesting against people who use that same AI/LLM to generate their text, while in same time having nothing to contribute.

I find this random brainstorming powerful, and I do expect well written and expanded text. This isn't coffee chat at the breakfast. This is empowering thread. I would ask you to contribute to brainstorming, instead of complaining on what tools people use.

Please don't post any ads, the true valuable things are thoughts.

Exaggerated.

gnusupport commented Apr 13, 2026

I am following principles from:

About Dynamic Knowledge Repositories (DKR):
https://www.dougengelbart.org/content/view/190/163/

Screenshot-2026-04-13-11-50-05-383814530

Thus ANYTHING can become and should be an elementary object. Objects can be packed, shared, displayed, whatever.

Even a short note. Or number, or UUID, file, database based note, entries, remote files, PDFs, anything.

Those files should never be moved or copied for reason of LLM/Wiki "ingestion", as that ingestion alone is already generating embeddings, and text snippets (that is sometimes more than the copy-size of the file).

Use embedding types:

1 Elementary objects (body)
2 People
3 Files
4 LLM Responses
5 Speech
6 Org Mode Headings
7 Emacs Lisp
8 Images
10 M-x command
11 Hyperscope Query
12 Elementary object (name)
13 URL text
14 E-mail (Maildir)

Add any embedding type.

Generating embeddings for everything.

Use different retrievals for specific uses cases, even grep works fine. Use PostgreSQL full text search, or mu find or notmuch you name it.

Use intersections. 120,000 documents can be intersected by it's properties in unlimited way:

different website pages;
different subjects;
languages, media types, sizes of documents, prices, etc.

Build your own DKR.

PurpleBanana-ai commented Apr 13, 2026

@gnusupport it makes it really hard to take any of the comments seriously if I feel like I'm talking to a modern version of ELIZA (with some self promotion thrown in—50 out of the 435 current comments are plugging their own projects).

It had to resonate with me if I am actually posting something for the bots, crawlers and other LLM's to analyze, but I thought this deserved a thumbs up at least. I wouldn't be looking into this entire concept if I didn't love AI and LLM's, but I agree with you on the comment issues. This type of work that Karpathy put out should compliment our intelligence, yet when its hit with what you felt and saw in the comments, then used AI to quantify, it raises a different curtain that some people are not going to like to see behind (especially if a mirror is there).

Using AI to analyze and measure "it" is exactly the right use case of blending our gray matter and silicon together, not in lieu of, but in tandem with. So, I agree with you, and personally, I would give it a name, and its another piece of the broader enshattification of everything. If people cannot even right a comment without using an LLM to "fine tune it", or worse, just cut and paste a response, then this all just becomes bots talking with bots, who were trained by previous bots, trained by other earlier bots, who were than trained on data that was crawled out from one of us meatbags using an original thought...without that first step at the bottom of the chain, we become a synthetic echo chamber quickly moving towards catastrophic rot. You can love working with AI-LLM's, and still use it without becoming dependent on it for every word, and you can also use it to point out flaws or find the pattern you found, they are not mutually exclusive.

Now for my self promoting plug, "...brought to you by carls jr., with support from Brawndo, its got what plants crave!"

gnusupport commented Apr 13, 2026

@PurpleBanana-ai fine, though personally I do not get frustrated on text laid out about projects of people. Problem is that IMHO majority of people, including me, we cannot express ourselves in such way that it is well by language standard, and that is is laid out in such way for the destined audience. And what to say for non-native English speakers? I cannot. I have to correct the text. I am welcoming those project makers, this thread became treasure to find out similar projects. I see nothing wrong with it.

I find that resistance to text generated by LLM funny, instead of reading the point of that -- as someone did put attention to provide ideas to you, people are looking how it sounds, like if there is "Overall," at the end, it sounds LLM generated. Though the word is not important or tool used, but the idea, and that is overlooked.

All projects represented seem to be very good in the direction from LLM/WIKI ideas.

I don't expect small talk on such technical subjects.

earaizapowerera commented Apr 13, 2026 via email

About the self promoting projects, I think I will create some kind of directory. Many people in this group are thinking similarily and of course want to be discovered. But despite the ego “issue”, every project has a unique way of solving things and if some of us want to share something with the community, not just imposing “my tool is the best idea ever”, this directory could be a good place to share it. BTW, I’m not sure about Brawndo, now my plants have a drinking problem. Obtener Outlook para Mac <https://aka.ms/GetOutlookForMac> De: PurpleBanana ***@***.***> Fecha: lunes, 13 de abril de 2026, 5:17 a.m. Para: PurpleBanana-ai ***@***.***> CC: Comment ***@***.***> Asunto: Re: karpathy/llm-wiki.md @PurpleBanana-ai commented on this gist.

…

________________________________ @gnusupport<https://github.com/gnusupport> it makes it really hard to take any of the comments seriously if I feel like I'm talking to a modern version of ELIZA (with some self promotion thrown in—50 out of the 435 current comments are plugging their own projects). It had to resonate with me if I am actually posting something for the bots, crawlers and other LLM's to analyze, but I thought this deserved a thumbs up at least. I wouldn't be looking into this entire concept if I didn't love AI and LLM's, but I agree with you on the comment issues. This type of work that Karpathy put out should compliment our intelligence, yet when its hit with what you felt and saw in the comments, then used AI to quantify, it raises a different curtain that some people are not going to like to see behind (especially if a mirror is there). Using AI to analyze and measure "it" is exactly the right use case of blending our gray matter and silicon together, not in lieu of, but in tandem with. So, I agree with you, and personally, I would give it a name, and its another piece of the broader enshattification of everything. If people cannot even right a comment without using an LLM to "fine tune it", or worse, just cut and paste a response, then this all just becomes bots talking with bots, who were trained by previous bots, trained by other earlier bots, who were than trained on data that was crawled out from one of us meatbags using an original thought...without that first step at the bottom of the chain, we become a synthetic echo chamber quickly moving towards catastrophic rot. You can love working with AI-LLM's, and still use it without becoming dependent on it for every word, and you can also use it to point out flaws or find the pattern you found, they are not mutually exclusive. Now for my self promoting plug, "...brought to you by carls jr., with support from Brawndo, its got what plants crave!" — Reply to this email directly, view it on GitHub<https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f#gistcomment-6095922> or unsubscribe<https://github.com/notifications/unsubscribe-auth/BSKFAJB2Z6Q3KEJR337KV3L4VTEDFBFHORZGSZ3HMVZKMY3SMVQXIZNMON2WE2TFMN2F65DZOBS2WR3JON2EG33NNVSW45FGORXXA2LDOOIYFJDUPFYGLJDHNFZXJJLWMFWHKZNJGE2DOMRVHAYDKMFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLKJSGI2DGNBRG43DFJDOMFWWLKDBMN2G64S7NFSIFJLWMFWHKZNEORZHKZNENZQW2ZN3ORUHEZLBMRPXAYLSORUWG2LQMFXHIX3BMN2GS5TJOR4YFJLWMFWHKZNEM5UXG5FENZQW2ZNLORUHEZLBMRPXI6LQMU>. You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

FBoschman commented Apr 13, 2026

I'm just not enough of a commercial guy I think. But the whole discussion about 'self promotion, AI bot training' it just does not resonate to me. I have added to this growing knowledge base, that's it. I'm curious about what others have to add to this idea. If bolstering your ego is your thing, than well do that on your own time. I am moving forward.

jurajskuska commented Apr 13, 2026 via email

🧠 NONO_AIAGENT — What Is This System? ❓ Problem Claude's context window is finite. Raw tool output (logs, files, commands) floods it fast. Conversations forget past decisions. Every session starts blind. ⚙️ What Was Built A 5-layer pipeline that protects context, persists knowledge, and compresses everything. Claude Code → context-mode sandbox → Obsidian vault → BM25 search → JSONL transcripts Layer Tool Role 🛡️ Sandbox context-mode MCP Intercepts big outputs, keeps them out of context 📓 Vault Obsidian MD Stores decisions, sessions, concepts across time 🔍 Search qmd / BM25 Query past sessions without loading them raw 📼 Transcripts JSONL indexing Full conversation recall, blind-searchable 🪨 Compression caveman plugin Strips prose fluff, cuts token cost ~32% 📊 Measured Results (session 2026-04-13) Metric Value Data processed 192.6 KB Kept out of context 122.4 KB (64%) Tokens saved ~31,325 Context savings ratio *2.7×* Startup context cost ~6,550 tokens ✅ What Is Solved - 🚫 No more context floods — sandbox absorbs big outputs - 🔁 No more amnesia — vault + JSONL = persistent memory across sessions - 💬 No more re-explaining — startup hooks inject prior context automatically - 🪨 No more bloat — caveman compress cuts prose files ~32%, code files untouched po 13. 4. 2026 o 16:56 Ferry Boschman ***@***.***> napísal(a):

…

***@***.**** commented on this gist. ------------------------------ I'm just not enough of a commercial guy I think. But the whole discussion about 'self promotion, AI bot training' it just does not resonate to me. I have added to this growing knowledge base, that's it. I'm curious about what others have to add to this idea. If bolstering your ego is your thing, than well do that on your own time. I am moving forward. — Reply to this email directly, view it on GitHub <https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f#gistcomment-6096815> or unsubscribe <https://github.com/notifications/unsubscribe-auth/A43A4RJ2LJIFOWY5O2DTGQL4VT5YBBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLKJRGQ4DSNRQHEZDPJDOMFWWLKDBMN2G64S7NFSIFJLWMFWHKZNEORZHKZNENZQW2ZN3ORUHEZLBMRPXAYLSORUWG2LQMFXHIX3BMN2GS5TJOR4YFJLWMFWHKZNEM5UXG5FENZQW2ZNLORUHEZLBMRPXI6LQMWWHG5LCNJSWG5C7OR4XAZNLI5UXG5CDN5WW2ZLOOSTHI33QNFRXHEMCUR2HS4DFURTWS43UUV3GC3DVMWUTCNBXGI2TQMBVGCTXI4TJM5TWK4VGMNZGKYLUMU> . You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

meghm1007 commented Apr 13, 2026

How's the token usage for such a project? As I scale and give more memory context I assume each run would consume exponentially more tokens

abbacusgroup commented Apr 13, 2026

The solution we developed allows the AI you pay for to do the coding, and a local LLM to maintain the second brain.

The maintenance burden. That is the insight here. Not the reading, not the thinking; the bookkeeping. Cross-references that decay. Contradictions that accumulate silently. Summaries that stop reflecting reality the moment a new decision is made. Humans abandon knowledge systems because the cost of keeping them honest eventually exceeds the value of having them at all.

I have been building against this exact problem. Cortex is a persistent knowledge system that runs as an MCP server. It classifies knowledge objects with a formal OWL-RL ontology, stores them in a dual architecture (Oxigraph SPARQL graph + SQLite FTS5), and reasons over them deterministically.

The distinction from file-based approaches: Cortex traces transitive chains. If A supersedes B and B supersedes C, it infers that A supersedes C. It catches contradictions structurally. It detects systemic patterns. It surfaces stale decisions. All of this without LLM calls. The reasoning is formal logic, not statistical prediction.

It runs locally from ~/.cortex/, speaks MCP, and works with any model.

Your LLM Wiki framing with a formal knowledge graph and MCP underneath feels like the natural convergence. I would be curious to hear your take.

https://github.com/abbacusgroup/cortex

How's the token usage for such a project? As I scale and give more memory context I assume each run would consume exponentially more tokens

jurajskuska commented Apr 13, 2026 via email

Sandboxing, ctx context, indexing, 2 level of sessions md files. First is prepared by ai agent after each session closing. All detailed chat is saved by claude automatically in jsonl files. AI agent is autmatically indexing also jsonl and when need detailed response which isnt in session md riles it can search it quickly without too much tokens overepending. If doing research with bigger source files they are not pulled as in karpathys solution but they are pulled using ctx and sandbox so saving the tokens too. Always checked context size used by human in session md files. If 2orking with big files they are indexed and not included in the context window. So saving context too anytime you are turning back to this source file. Juraj Dňa po 13. 4. 2026, 19:36 Megh Mehta ***@***.***> napísal(a):

…

***@***.**** commented on this gist. ------------------------------ How's the token usage for such a project? As I scale and give more memory context I assume each run would consume exponentially more tokens — Reply to this email directly, view it on GitHub <https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f#gistcomment-6097137> or unsubscribe <https://github.com/notifications/unsubscribe-auth/A43A4RMGP7ICSYGRVTPDS3D4VUQTRBFHORZGSZ3HMVZKMY3SMVQXIZNMON2WE2TFMN2F65DZOBS2WR3JON2EG33NNVSW45FGORXXA2LDOOIYFJDUPFYGLJDHNFZXJJLWMFWHKZNJGE2DOMRVHAYDKMFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLKBZGMYTSMRTGU4KI3TBNVS2QYLDORXXEX3JMSBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDHNFZXJJDOMFWWLK3UNBZGKYLEL52HS4DF> . You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

gitdexgit commented Apr 13, 2026 •

edited

Loading

little QoL feature: Read less; get the meaning; move on

Description:
Add TL;DR to your ~/wiki. Caveman communication is the way <- Abstracting(less words; most meaning) the Answer/.md for the human. But you can click button to read details if you need to.

goal1: Read the gist of the ~/wiki or LLM answer. But you have option to read detailed answer. <- Abstracting the Answer/.md for the human. but you can click button to read details if you need

goal2: Read less words; get the the most meaning; decide to read the whole detailed .md or move on. The less you read the better; because you focus on output more -> writing. Always communicate(write, read in IDE) simply first, but you have option to go into detail(The main .md <- The source code; a very detailed almost research like .md paper or article for all context for LLM and analytical reading).

Solution:

https://github.com/JuliusBrussee/caveman

Call this summary version. or readable version. But the main .md This is for the LLM. While the other .md is the summary version. the TL;DR version that both are accessible to the same knowledge. This can be added in the editing layer where you ask Q&A as well.

Again the goal is less words; more meaning.

Details:

Problem1: The longer, the more /raw data you have. The more .md files you have to read. The longer it takes for you to read. The less your brain remembers. You keep asking the same Q&A again and again. Potential useful Q&A might not be asked. You miss understand the information contained in the ~/wiki. you dump bad /raw. LLM compiles. Asking Q to delete bad .md because of bad /raw. You waste time. LLM can't carry for everything.

Problem2: You read .md 1week ago. You don't really remember what it's all about. You ask Q&A, find it with llm help. You reread the .md that has the same detailed words. Your goal is only memory refresher not to re-read the whole thing again <-- too much scrolling down and too much eye scanning for many words. You take longer to kick the engine(producing actual human output) to feed /raw ingest.

idea1(something like this): Instead of LLM giving you 1 answer to your 1 question. It gives you 2 answers. You read the gist answer(very less words; keep most meaning) But you decide to read the fully detailed answer if you want.

Idea2(something like this): in compiling or producing .md. Make article1.md for LLM that is detailed. And make a 2nd version of the same article1-human.md file for the human to keep as much meaning as possible using as little words or data as possible. But user can decide to read further <-- saves time. Because there are always 2 files of the same .md. 1 for the LLM and your system 2 brain (long reading sessions or for reasoning) the other .md for your system 1 brain and long-term mental model(the logic) retention thought frequent repetition of the logic, because you read less to get the logic; build mental connections faster.

explaining:
There is a problem with the language used for edits. English or any LLM output language contains a lot of fluff that is baked in the Model's way of training -> Lot's of words; low meaning. This is not helpful when your goal is to look for personal information as effectively as possible.

This creates a problem where it gets harder to read after the 2k file in the wiki. <- It's a human problem. You solve it with Q&A sure. Don't read the ~/wiki just ask questions and LLM goes there and brings it up, gives links or sources at the very end.

I believe there should be 2 versions of .md of each file in the ~/wiki. One is the "detailed" or compiled .md output in all of it's glory, the source code .md. The 2nd is the same version but the focus is less words more meaning.

Where fast access is needed, fast communication. these ideas might help you over time to develop the gist in your brain just from the ~/wiki and the repeated process of fast Q&A and fast reading of the logic of .md first before going into detail. Your brain in theory should retain more of the answer so you don't have to ask the question that aren't needed. Also LLM is really good at compressing data into as little words as possible while preserving the highest meaning as possible.

Hopefully with less words you can work more efficiently as you specify or ask LLM for further questions to clarify. But first you need your brain to detect it first. Using big words when very few words get the job done saves brain power to focus on the words that matter the most.

Further abstractions are: Q&A but instead of getting a detailed answer or the option to pick the less wordy answer, you are provided with questions to which if answered you don't need to read.

So "A lot of details" --> "less words; preserve as much meaning" --> "1 or 2 questions or a series of questions to which answered by you in your brain then no need to read"

payneio commented Apr 13, 2026

I built https://github.com/payneio/prism last year to provide tooling for LLMs to write wikis. Prism, similarly, handles the fiddly bits of wiki maintenance... mostly through front-matter. I went pretty deep into the structure of knowledge bases because I wanted to allow the LLM to be able to break up large pages, combine pages, deep link, symlink, summarize, tag, etc. etc... and the big one, make a different page the root and have all the navigition/links/urls updated accordingly. Making a new node the root models two common scenarios: 1) as the wiki is growing, realizing that you've evolved a new focus, and (2) being able to grab any page and its n-deep neighbor walk (a sub-wiki) and share it with someone else (or another agent).

When I got that far, though, I just realized I was making a graphdb, and that the wiki is just a view for humans... which will have limited utility as agent fleets scale (we just don't have enough attention to read everything)... so we might as well just give the agents their own graphdbs/triple-strores/whatever along with some agentic knowledge management rules.

Down with the hierarchy! Knowledge wants to be free! 😆

jurajskuska commented Apr 13, 2026 via email

May be fr users simplified explanation could be better: Why This System Exists — Simple Explanation *The problem with AI assistants by default:* Every conversation starts blank. Claude remembers nothing. You explain context every time. Alternatively, you dump everything into the context — past chats, docs, notes — and burn through the 200k token limit fast. *The token budget analogy:* Think of 200k tokens as a whiteboard. Once full, old stuff gets erased. You want to start with the whiteboard mostly empty — so there's room to actually work. ------------------------------ *What this system does:* At startup, hooks inject two small curated files: - SYSTEM_EXPLAINED.md — how the system works - CLAUDE.md — rules and vault structure Total cost: ~6,500 tokens. That's *~3% of 200k*. Whiteboard still nearly empty. Claude knows enough to start. ------------------------------ *During session — the key insight:* Past sessions exist as JSONL files (raw conversation transcripts). A single session JSONL = 50–200KB. Loading one whole = 12,000–50,000 tokens. Loading all of them = fills the whiteboard instantly. Instead: *BM25 search*. You ask "what did we decide about X?" → system searches all indexed JSOLs → returns only the 3-5 matching paragraphs → maybe 500 tokens. Same answer. 1% of the cost. ------------------------------ *The architecture in one sentence:* Small curated startup + on-demand search = full knowledge access at 15% token usage instead of 80%+.

…

------------------------------ *Why not just always search everything?* Startup context loads *instantly*, before your first message. Search requires a tool call. Core facts (vault structure, rules, file locations) you need immediately every session — so those stay in startup. Details you need rarely (exact quote from a session 3 weeks ago) — those stay searchable, never preloaded. *Right thing in right place. Nothing wasted.* ------------------------------ Handling Large External Sources *The problem:* You find a webpage or doc — 100k tokens raw. Loading it directly = half your whiteboard gone instantly. One source, before you've done any work. ------------------------------ *What happens instead:* *Step 1 — Fetch and index, never load* ctx_fetch_and_index(url) runs. It fetches the page, processes it, stores it in a local search database (SQLite). *None of it enters the context window.* Cost: ~0 tokens. A small pointer file gets saved to OBSIDIAN/indexed/<name>.md — just frontmatter, source URL, date. Maybe 50 tokens. *Step 2 — Search, don't read* Need something from that 100k doc? ctx_search("your question") → BM25 finds the relevant paragraphs → returns maybe 800 tokens of actual answer. You queried 100k. You received 800. Whiteboard barely moved. ------------------------------ *Compared to standard approach:* Approach Tokens used Load whole page ~25,000–50,000 WebFetch (cleaned) ~15,000–30,000 ctx_fetch_and_index + ctx_search ~800–2,000 ------------------------------ *The mental model:* The 100k doc lives in a library. You don't bring the whole library into the room. You send a librarian with a specific question. Librarian returns one page with the answer. Context window is the room. Sandbox database is the library. ctx_search is the librarian. *Whiteboard stays clean. Full knowledge still accessible.* ------------------------------ What If You Need More From That File Later? Same mechanic. Another ctx_search call. The indexed file stays in the database permanently — not in context. You can query it 20 times in one session. Each query costs ~800 tokens of results, not 100k. *Example flow:* 1. Morning: fetch + index 100k doc → 0 tokens used 2. "What does it say about authentication?" → search → 800 tokens 3. Later: "And what about rate limits?" → search again → 800 tokens 4. Later: "Find the error codes section" → search again → 600 tokens *Total cost for 3 queries into 100k doc: ~2,200 tokens.* Without this system: you'd load the doc once = 25,000–50,000 tokens, and it sits there consuming whiteboard for the whole session whether you're actively using it or not. ------------------------------ *Key point:* the index persists across sessions too. Index it today, search it next week. Never fetch again. The librarian remembers every book that's ever been shelved. ------------------------------ Why We Implemented Caveman *The startup context problem:* Even optimised startup files cost tokens. SYSTEM_EXPLAINED.md = ~4,800 tokens every single session. That's prose — sentences explaining things, articles, connective words. Most of it fluff carrying a small core of actual facts. ------------------------------ *What Caveman does:* Compresses prose files into stripped, dense format. Drops articles, filler words, pleasantries. Keeps every technical fact, every rule, every decision intact. Example: "The reason we are using this approach is that it allows Claude to access the information it needs without having to load the entire file into context." Becomes: "Approach: access needed info without full file load." Same information. ~60% fewer tokens. ------------------------------ *How it works practically:* /caveman:compress SYSTEM_EXPLAINED.md - Compresses file in place (~2–32% smaller depending on content type) - Backs up original as SYSTEM_EXPLAINED.original.md - Both get indexed — original for nuance, compressed for speed *Two-track rule:* Version When to search .original.md Complex question, nuance matters compressed .md Fast lookup, routine question ------------------------------ *Net effect on startup:* Startup files already small. Caveman makes them smaller. Session might open at 8% token usage instead of 15%. More room. More sessions before whiteboard fills. Cheaper per conversation. ------------------------------ *The compounding math:* Every session loads startup context. If startup shrinks by 30%, that saving repeats every single session forever. One compression run, permanent per-session dividend. *Small file + caveman = smallest possible informed start.* ------------------------------ The Full Picture — Summary You work. Whiteboard stays clean. - Startup: only essentials loaded (~3–15% of 200k) - Big files: indexed once, never in context - Need something from them: one search → one answer → ~800 tokens - Next question from same file: another search → another 800 tokens - File itself never sits on whiteboard burning space between questions *The boat analogy:* Standard approach = drag the whole boat into the room every time you need an anchor. This system = boat stays in the harbour. You radio for exactly what you need. Anchor arrives. Boat stays outside. ------------------------------ *And caveman on top:* Even the small files that DO load at startup — compressed further. Less weight from the start. More room for actual work. ------------------------------ *Result:* You can run long complex sessions on big projects, reference large docs, search past conversations — and still have 70–80% of whiteboard available for the actual thinking and output. *The context window is a workspace, not a storage room.* ------------------------------ *Written 2026-04-13. Target: humans unfamiliar with token budgets and context windows. One more session planned to refine and simplify further.* po 13. 4. 2026 o 20:04 Paul Payne ***@***.***> napísal(a):

***@***.**** commented on this gist. ------------------------------ I built https://github.com/payneio/prism last year to provide tooling for LLMs to write wikis. Prism, similarly, handles the fiddly bits of wiki maintenance... mostly through front-matter. I went pretty deep into the structure of knowledge bases because I wanted to allow the LLM to be able to break up large pages, combine pages, deep link, symlink, summarize, tag, etc. etc... and the big one, make a different page the root and have all the navigition/links/urls updated accordingly. Making a new node the root models two common scenarios: 1) as the wiki is growing, realizing that you've evolved a new focus, and (2) being able to grab any page and its n-deep neighbor walk (a sub-wiki) and share it with someone else (or another agent). When I got that far, though, I just realized I was making a graphdb, and that the wiki is just a view for humans... which will have limited utility as agent fleets scale (we just don't have enough attention to read everything)... so we might as well just give the agents their own graphdbs/triple-strores/whatever along with some agentic knowledge management rules. Down with the hierarchy! Knowledge wants to be free! 😆 — Reply to this email directly, view it on GitHub <https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f#gistcomment-6097180> or unsubscribe <https://github.com/notifications/unsubscribe-auth/A43A4RK5MZXU5CEEUWYYC434VUTZTBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJZRGA3TIMZWG6SG4YLNMWUGCY3UN5ZF62LEQKSXMYLMOVS2I5DSOVS2I3TBNVS3W5DIOJSWCZC7OBQXE5DJMNUXAYLOORPWCY3UNF3GS5DZQKSXMYLMOVS2IZ3JON2KI3TBNVS2W5DIOJSWCZC7OR4XAZNMON2WE2TFMN2F65DZOBS2WR3JON2EG33NNVSW45FGORXXA2LDOOIYFJDUPFYGLJDHNFZXJJLWMFWHKZNJGE2DOMRVHAYDKMFHORZGSZ3HMVZKMY3SMVQXIZI> . You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

gnusupport commented Apr 13, 2026 via email

* gitdexgit ***@***.***> [2026-04-13 20:55]:

@gitdexgit commented on this gist: # A little QoL feature: Read less; get the meaning; move on <-- Add TL;DR to your ~/wiki ------------ Problem1: The longer, the more /raw data you have. The more .md files you have to read. The longer it takes for you to read. The less your brain remembers. You keep asking the same Q&A again and again. Potential useful Q&A might not be asked. You miss understand the information contained in the ~/wiki. you dump bad /raw. LLM can't carry for everything.

Yes, that type of situations come from practical situations, that is how it comes in real life. On my side, I may need to remember some tags, some words, some people related to documents in order to find those pieces of information.

Problem2: You read .md 1week ago. You don't really remember what it's all about. You ask Q&A, find it with llm help. You reread the .md that has the same detailed words. Your goal is only memory refresher not to re-read the whole thing again <-- too much scrolling down and too much eye scanning for many wrods

Maybe, sounds like real life situation. Though, other issue: Markdown files people take more or less as "text" these days, while markdown is basically un-htmled HTML, the method to convert everything to HTML. And it is definitely not the bast markup language out there. All users should be free to use any kind of markup language. Why in first place polute everything with .md files and assume that is super foundation for future? I keep using any kind of markup. It should be irrelevant.

Idea: a article1.md for LLM that is detailed. A 2nd version of the same article1-human.md file for the human to keep as much meaning as possible using as little words or data as possible. But user can decide to read further <-- saves time.

That is one among variety and unlimited practical situations in life. Do people use those tools to advance life and improve? Or just for pleasure? Just for notes? But no real results? I wonder. On my side, LLM tools are mainly used to correct and present practical ideas that clients can use, win out of it, profit, so that we can all grow together. I have almost little or no use of the LLM in private life. I can verify some emotional, or societal issues, but not more than that, I can generate movie genres and stories for fun and private. So dwelling in the documents and information isn't really ideal life.

There is a problem with the language used for edits. English or any LLM output language contains a lot of fluff that is baked in the Model's way of training -> Lot's of words; low meaning. This is not helpful when your goal is to look for personal information as effectively as possible.

Too many varieties of situation out there exist, so your situation could easily be remedied. Looking at coding agents, they act precise and fast, that is one example how it could be remedied.

I find that approach good. I am having elementary objects, so there is name, description, text, internal report, report, and unlimited other properties. There can be text, plus text for TTS, which would be better prepared for talking. I find that layered representation very useful.

Hopefully with less words you can detect bad data so you specify or ask LLM for further questions to clarify. But first you need your brain to detect it first. Using big words when very few words get the job done saves brain power to focus on the words that matter the most.

Instead of asking each time, a deterministic evaluation could be done and information marked or tagged for future decision making.

jurajskuska commented Apr 13, 2026 via email

I think my solution solved all your problems. Juraj Dňa po 13. 4. 2026, 21:30 GNU Support ***@***.***> napísal(a):

…

***@***.**** commented on this gist. ------------------------------ * gitdexgit ***@***.***> [2026-04-13 20:55]: > @gitdexgit commented on this gist: > > # A little QoL feature: Read less; get the meaning; move on <-- Add TL;DR to your ~/wiki > ------------ > > Problem1: The longer, the more /raw data you have. The more .md > files you have to read. The longer it takes for you to read. The > less your brain remembers. You keep asking the same Q&A again and > again. Potential useful Q&A might not be asked. You miss understand > the information contained in the ~/wiki. you dump bad /raw. LLM > can't carry for everything. Yes, that type of situations come from practical situations, that is how it comes in real life. On my side, I may need to remember some tags, some words, some people related to documents in order to find those pieces of information. > Problem2: You read .md 1week ago. You don't really remember what > it's all about. You ask Q&A, find it with llm help. You reread the > .md that has the same detailed words. Your goal is only memory > refresher not to re-read the whole thing again <-- too much > scrolling down and too much eye scanning for many wrods Maybe, sounds like real life situation. Though, other issue: Markdown files people take more or less as "text" these days, while markdown is basically un-htmled HTML, the method to convert everything to HTML. And it is definitely not the bast markup language out there. All users should be free to use any kind of markup language. Why in first place polute everything with .md files and assume that is super foundation for future? I keep using any kind of markup. It should be irrelevant. > Idea: a article1.md for LLM that is detailed. A 2nd version of the > same article1-human.md file for the human to keep as much meaning as > possible using as little words or data as possible. But user can > decide to read further <-- saves time. That is one among variety and unlimited practical situations in life. Do people use those tools to advance life and improve? Or just for pleasure? Just for notes? But no real results? I wonder. On my side, LLM tools are mainly used to correct and present practical ideas that clients can use, win out of it, profit, so that we can all grow together. I have almost little or no use of the LLM in private life. I can verify some emotional, or societal issues, but not more than that, I can generate movie genres and stories for fun and private. So dwelling in the documents and information isn't really ideal life. > There is a problem with the language used for edits. English or any > LLM output language contains a lot of fluff that is baked in the > Model's way of training -> Lot's of words; low meaning. This is not > helpful when your goal is to look for personal information as > effectively as possible. Too many varieties of situation out there exist, so your situation could easily be remedied. Looking at coding agents, they act precise and fast, that is one example how it could be remedied. > I believe there should be 2 versions of .md of each file in the > ~/wiki. One is the "detailed" or compiled .md output in all of it's > glory, the source code .md. The 2nd is the same version but the > focus is less words more meaning. I find that approach good. I am having elementary objects, so there is name, description, text, internal report, report, and unlimited other properties. There can be text, plus text for TTS, which would be better prepared for talking. I find that layered representation very useful. > Hopefully with less words you can detect bad data so you specify or > ask LLM for further questions to clarify. But first you need your > brain to detect it first. Using big words when very few words get > the job done saves brain power to focus on the words that matter the > most. Instead of asking each time, a deterministic evaluation could be done and information marked or tagged for future decision making. — Reply to this email directly, view it on GitHub <https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f#gistcomment-6097260> or unsubscribe <https://github.com/notifications/unsubscribe-auth/A43A4RLQ7EHIABGDWR5PC734VU53RBFHORZGSZ3HMVZKMY3SMVQXIZNMON2WE2TFMN2F65DZOBS2WR3JON2EG33NNVSW45FGORXXA2LDOOIYFJDUPFYGLJDHNFZXJJLWMFWHKZNJGE2DOMRVHAYDKMFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLKBSGQ4DENJTHA32I3TBNVS2QYLDORXXEX3JMSBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDHNFZXJJDOMFWWLK3UNBZGKYLEL52HS4DF> . You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

hectordww-alt commented Apr 13, 2026

I wrote a tiny add-on prompt for this pattern focused on taste logs: music, films, books, etc.

The idea is to keep plain markdown logs plus small curator instructions, so an agent can avoid repeats, use misses as negative signal, and make recommendations from actual taste history rather than starting from zero each time.

https://gist.github.com/hectordww-alt/30c3e6af4ec77001f21b8b103e0115ff

ilya-epifanov commented Apr 13, 2026

I wrote a couple of tools augmenting LLM-wiki:

https://github.com/ilya-epifanov/llmwiki-tooling — a CLI utility to simplify linting, checking and fixing links, optionally enforcing frontmatter fields, sections in markdown etc. It's supposed to be used by the agent for consistency and to save some tokens.
https://github.com/ilya-epifanov/wikidesk:
- a client binary that syncs a copy of wiki/ locally and can talk to the server to initiate a research
- a server that spawns a Claude (or any other agent) instance whenever it receives a research request (with adjustable additional prompt)

Both tools are as unopinionated as possible. They should work with any reasonably non-disfigured LLM-wiki setup.

Works great for me!
My use case: claude on DGX Spark (actually an ASUS thingy) is busy designing an ML training pipeline while having access to my ML wiki. A couple of research requests it has sent so far have properly incrementally updated the wiki and pulled in relevant papers.
🎆

karpathy/llm-wiki.md

LLM Wiki

The core idea

Architecture

Operations

Indexing and logging

Optional: CLI tools

Tips and tricks

Why this works

Note

LangSensei commented Apr 13, 2026

Uh oh!

KarabutRom commented Apr 13, 2026

Uh oh!

johnsamuelwrites commented Apr 13, 2026

Uh oh!

n7-ved commented Apr 13, 2026

Uh oh!

gnusupport commented Apr 13, 2026 via email

Uh oh!

mauceri commented Apr 13, 2026 via email

Uh oh!

gnusupport commented Apr 13, 2026 via email

Uh oh!

gnusupport commented Apr 13, 2026 via email

Uh oh!

FBoschman commented Apr 13, 2026

Framework: Fleeting → Concept Notes

The FUNGI Framework

Concept Note Template

Ethical AI Guardrails

Mycelial Principles in Practice

How to Use This With Me

Pros and Cons of This Approach

Uh oh!

sheldon123z commented Apr 13, 2026

Uh oh!

freakyfractal commented Apr 13, 2026

Uh oh!

akshayram1 commented Apr 13, 2026

🏗️ Simple Architecture Diagram

Uh oh!

SonicBotMan commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gnusupport commented Apr 13, 2026

Uh oh!

gnusupport commented Apr 13, 2026

Uh oh!

PurpleBanana-ai commented Apr 13, 2026

Uh oh!

gnusupport commented Apr 13, 2026

Uh oh!

earaizapowerera commented Apr 13, 2026 via email

Uh oh!

FBoschman commented Apr 13, 2026

Uh oh!

jurajskuska commented Apr 13, 2026 via email

Uh oh!

meghm1007 commented Apr 13, 2026

Uh oh!

abbacusgroup commented Apr 13, 2026

Uh oh!

jurajskuska commented Apr 13, 2026 via email

Uh oh!

gitdexgit commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

little QoL feature: Read less; get the meaning; move on

Solution:

Details:

Uh oh!

payneio commented Apr 13, 2026

Uh oh!

jurajskuska commented Apr 13, 2026 via email

Uh oh!

gnusupport commented Apr 13, 2026 via email

Uh oh!

jurajskuska commented Apr 13, 2026 via email

Uh oh!

hectordww-alt commented Apr 13, 2026

SonicBotMan commented Apr 13, 2026 •

edited

Loading

gitdexgit commented Apr 13, 2026 •

edited

Loading