Skip to content

Instantly share code, notes, and snippets.

@karpathy
Created April 4, 2026 16:25
Show Gist options
  • Select an option

  • Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.

Select an option

Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.

This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.

You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

This can apply to a lot of different contexts. A few examples:

  • Personal: tracking your own goals, health, psychology, self-improvement — filing journal entries, articles, podcast notes, and building up a structured picture of yourself over time.
  • Research: going deep on a topic over weeks or months — reading papers, articles, reports, and incrementally building a comprehensive wiki with an evolving thesis.
  • Reading a book: filing each chapter as you go, building out pages for characters, themes, plot threads, and how they connect. By the end you have a rich companion wiki. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages covering characters, places, events, languages, built by a community of volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing and maintenance.
  • Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. Possibly with humans in the loop reviewing updates. The wiki stays current because the LLM does the maintenance that no one on the team wants to do.
  • Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.

Architecture

There are three layers:

Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

Operations

Ingest. You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. Personally I prefer to ingest sources one at a time and stay involved — I read the summaries, check the updates, and guide the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. It's up to you to develop the workflow that fits your style and document it in the schema for future sessions.

Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms depending on the question — a markdown page, a comparison table, a slide deck (Marp), a chart (matplotlib), a canvas. The important insight: good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.

Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. The LLM is good at suggesting new questions to investigate and new sources to look for. This keeps the wiki healthy as it grows.

Indexing and logging

Two special files help the LLM (and you) navigate the wiki as it grows. They serve different purposes:

index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count. Organized by category (entities, concepts, sources, etc.). The LLM updates it on every ingest. When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.

log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title), the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log gives you a timeline of the wiki's evolution and helps the LLM understand what's been done recently.

Optional: CLI tools

At some point you may want to build small tools that help the LLM operate on the wiki more efficiently. A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option: it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It has both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you vibe-code a naive search script as the need arises.

Tips and tricks

  • Obsidian Web Clipper is a browser extension that converts web articles to markdown. Very useful for quickly getting sources into your raw collection.
  • Download images locally. In Obsidian Settings → Files and links, set "Attachment folder path" to a fixed directory (e.g. raw/assets/). Then in Settings → Hotkeys, search for "Download" to find "Download attachments for current file" and bind it to a hotkey (e.g. Ctrl+Shift+D). After clipping an article, hit the hotkey and all images get downloaded to local disk. This is optional but useful — it lets the LLM view and reference images directly instead of relying on URLs that may break. Note that LLMs can't natively read markdown with inline images in one pass — the workaround is to have the LLM read the text first, then view some or all of the referenced images separately to gain additional context. It's a bit clunky but works well enough.
  • Obsidian's graph view is the best way to see the shape of your wiki — what's connected to what, which pages are hubs, which are orphans.
  • Marp is a markdown-based slide deck format. Obsidian has a plugin for it. Useful for generating presentations directly from wiki content.
  • Dataview is an Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
  • The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free.

Why this works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

The idea is related in spirit to Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's vision was closer to this than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part he couldn't solve was who does the maintenance. The LLM handles that.

Note

This document is intentionally abstract. It describes the idea, not a specific implementation. The exact directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain, your preferences, and your LLM of choice. Everything mentioned above is optional and modular — pick what's useful, ignore what isn't. For example: your sources might be text-only, so you don't need image handling at all. Your wiki might be small enough that the index file is all you need, no search engine required. You might not care about slide decks and just want markdown pages. You might want a completely different set of output formats. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest.

@mauceri
Copy link
Copy Markdown

mauceri commented Apr 14, 2026 via email

@Mekopa
Copy link
Copy Markdown

Mekopa commented Apr 14, 2026

My experince is LLMs are able to discover knowlage better with a graph reprsentation layer

beyond just .md using data files like;
.ics
.vcf
...
and "link" them with each other, similar to how obsidian does for .md's

basicly a dead simple wiki of your life, and Im using a graph.json to keep my graph up to date

@gnusupport
Copy link
Copy Markdown

Look, I don't have excessive pride in myself or any particular tool — my confidence is in long-term systems proven over decades, not weekend hacks. I've analyzed OmegaWiki and the others here, and none of them are time-proof. Here's the fatal contradiction: if you delegate everything to an LLM-Wiki, the system eventually blocks and becomes unmanageable — unless the human gets more involved to fix the mess. That means the LLM-Wiki principle wants to escape itself! It promises "near zero maintenance" but delivers a growing burden of linting, fixing contradictions, patching broken links, and verifying hallucinations. The only way out is more human work, not less. That's not a solution — that's a trap. 🐑💀

Another huge trap -- authors are mostly coding by the LLM. And the LLM could know the outcomes in future, but authors, let us call them curators, will not ask the LLM right questions. So there we are at the fundamental problem, asking the right questions.

Idea is accepted like the Amen from sole Jesus Christ, or Elohim, whoever. And they go spending their Claude money/tokens to show-off here something what is unmanageable and where authors didn't put that much thinking.

And me stupid, even analyzed few of those projects to see how it goes.

Each project is pretty large, yet no collaborators, no users, no issues or problems reported.

Scalability? Almost zero. Computer cannot be handling locally what is being stated here. I am running LLMs on GPU, if I would have 120000 files to be each time like that, automatically LLM-WIKI-sheep-style expanded, then my computer would block, burn or otherwise be inaccessible for human.

Let us say this way, the LLM-WIKI idea while sounding hype, it is basically useless piece of crap.

@gnusupport
Copy link
Copy Markdown

My experince is LLMs are able to discover knowlage better with a graph reprsentation layer

Seems like you are actually using it, and having it productive for you personally. That is how it should be.

@mauceri
Copy link
Copy Markdown

mauceri commented Apr 14, 2026 via email

@rohitg00
Copy link
Copy Markdown

Extended version - LLM Wiki v2

@mursu-ai
Copy link
Copy Markdown

@karpathy
Wow, what a great idea Andrej! Thanks for posting it. To me it sounds very much like a RAG with memory (MemRAG) and keeping the wiki folder into a DB (possibly GraphDB?) could help with scaling.

@gnusupport
Copy link
Copy Markdown

gnusupport commented Apr 14, 2026

It all depends on how you intend to use it, for one thing; and for another, it’s perhaps pointless to call people sheep simply because you don’t see what they might like about this text.

Calling them "sheep" isn't about insulting their intelligence — it's about shocking them awake to realize that blindly following a seductive pattern without questioning its long-term outcomes leads to an unmanageable system that will eventually collapse under its own contradictions, broken links, and loss of human control. Especially when the authority figure they're following has literally declared himself to have "AI psychosis" and admitted he's on the verge of insanity. 🐑💀

OpenAI cofounder says he hasn't written a line of code in months and is in a 'state of psychosis' | Fortune:
https://fortune.com/2026/03/21/andrej-karpathy-openai-cofounder-ai-agents-coding-state-of-psychosis-openclaw/

He openly admits he hasn't written code in months and is in a state of psychosis — so was the LLM-Wiki a deeply considered architecture or just a vibe-coded hallucination he threw out while losing his grip? 🤡💀

@mikhashev
Copy link
Copy Markdown

mikhashev commented Apr 14, 2026

To: @karpathy and @torvalds and all participants

Proposed Comment for Gist Discussion

Git object model as a knowledge backend — why reinvent the wheel?

Going through the 485+ comments, I see a recurring pattern: we are all building custom infrastructure for graph databases, SPARQL, entity stores, and lint pipelines from scratch. But we already have a battle-tested, content-addressable storage with deduplication, provenance, and branching built-in: Git internals.

Instead of just storing Markdown files, why not map knowledge units directly to the Git object model?

The Mapping:

  • Blob → Atomic knowledge unit (a single fact, a proven pattern, or even a "rejected approach").
  • Tree → Category/Index (a directory of related concepts or a specific context snapshot).
  • Commit → Provenance event (who added what, when, and why — with a clear message/reasoning).
  • Branch → Competing hypotheses or parallel research threads (keeping uncertainty alive until evidence resolves it).
  • Merge → Synthesis or resolution (one interpretation wins, or they are merged into a unified truth).
  • Tag → Stable knowledge snapshot ("verified/audited as of date X").

What this gives us for free:

  1. Content Deduplication: Same knowledge = same SHA. This prevents "LLM agents" vs "AI agents" duplicates from bloating the context.
  2. Immutable Provenance: Every fact knows its origin. No more "mostly correct" JSON failures that are hard to trace.
  3. Anti-Repetition Memory: Failed experiments stored as typed blobs. The agent can query "what didn't work" before wasting tokens trying it again.
  4. Diff-based Reviews: A clean way to see exactly how the knowledge state evolved between agent iterations.

The Open Challenge: Active Recall
The biggest gap remains: "How does the agent know to look for something it forgot it has?" Even with a perfect Git-based index, triggering retrieval during a conversation without hardcoded triggers is still the "holy grail." Semantic hashes and tags help, but the "I didn't know I should search" problem is still open.

Pragmatic Take:
Current Markdown + vector search covers ~90% of use cases for ~10% of the effort. But when we hit the walls of scale, deduplication, and provenance, the Git object model becomes a very compelling "knowledge plumbing" solution.

Would love to hear if anyone is already experimenting with using git plumbing commands (not just the porcelain) as their agent's memory backend!

@gnusupport
Copy link
Copy Markdown

@mikhashev git is for source code, not for granular knowledge management. Maybe take five minutes to read at Doug Engelbart already figured out decades ago. You know, an actual Open Hyperdocument System. Not a pile of hashed blobs pretending to be a "brain". https://www.dougengelbart.org/content/view/110/460/

And whole this LLM-WIKI stuff, is like there is some problem there to be solved, while there is none! There are already millions of knowledge management system with so much better architecture.

This whole thread is just 🐑🐑🐑 following the perceveid 👑, "joining" to resolve a problem that has been resolved long ago. But sure, keep reinventing the wheel while calling it innovation — with the LLM-generated BS generating more problems than the imaginary one that was imagined to be solved.

@karlwirth
Copy link
Copy Markdown

Thank you for this idea of using LLMs to build and maintain a wiki. I have been experimenting with this since you proposed it.

Their is friction with the toolchain you propose: Obsidian + Claude Code terminal + browser extension + git + local search is five processes across three windows, and the ingest/lint pipeline is fully manual. It only runs when you remember to run it.

We took the same concept and built it in Nimbalyst, where the markdown editor and Claude Code or Codex are in the same integrated workspace, so there's no tool-switching. A single prompt bootstraps a /wiki command, a daily automation that compiles new sources into wiki pages, and a weekly automation for contradiction detection and stale content cleanup. The automations run on a schedule, so the wiki maintains itself rather than depending on you to trigger each step.

The prompt and more details are here if anyone wants to try it: https://nimbalyst.com/use-cases/knowledge-base/

For me, this wiki LLM approach has been moderately helpful. Anyone have suggestions on how you are getting more out of it?

@YAMLcase
Copy link
Copy Markdown

Windows user here: I'd like to try this, but my first hurdle is getting Obsidian to NOT be a PITA trying to switch between vaults (I'm already using it). Anyone have suggestions on a good workaround, or an alternative markdown viewer-editor-in-one?

@karlwirth
Copy link
Copy Markdown

karlwirth commented Apr 14, 2026 via email

@earaizapowerera
Copy link
Copy Markdown

earaizapowerera commented Apr 14, 2026 via email

@harshitgavita-07
Copy link
Copy Markdown

harshitgavita-07 commented Apr 14, 2026

@karpathy and everyone ,

Building on this I took the idea file a bit too literally and wired it into an AI‑native OS shell instead of “just another app”.

I’ve been hacking on AIOS, a Linux‑based AI operating environment with Rust + Python where the LLM is a first‑class process, not a website tab. Over the last week I bolted on an LLM‑wiki mode that treats a folder of markdown notes as a living knowledge base with three primitives:

  • ingest: watch a directory (docs, code, PDFs converted to md), chunk + normalize, and let the agent compile it into a structured wiki graph instead of raw embeddings.

  • query: local LLM (via Ollama) answers questions by editing the wiki first, then responding from the updated state, so answers always come from the same artifact you can grep / git diff.

  • lint: an agent pass that scans the wiki for contradictions, stale claims, “TODO: verify” zones, and proposes concrete edits as patches.

A few opinions baked in, inspired by the gist:

  • Local‑first: the whole thing runs offline with Ollama + plain markdown; no external APIs, so your “second brain” is just a Git repo and a folder.

  • OS‑level, not app‑level: AIOS exposes the wiki as a system primitive — you can script it from the shell, plug it into cron, or let other agents treat it as the canonical memory instead of each tool reinventing RAG.

  • Multilingual: I’m in India, so my real use‑case is English + Hindi/Marathi mixed notes; ingestion normalizes and tags language so the wiki doesn’t collapse on code‑mixed mess.

Current real‑world test: I’m feeding AIOS my own ML experiments (JAX micrograd rebuild, VLIW performance kernels, and some hackathon work) and using the wiki as a personal “lab notebook compiler” — every new experiment notebook gets distilled into consistent, cross‑linked pages the agent can then reason over.

I’m also actively looking for roles (ML engineer / applied AI / agentic systems) or serious collab work around this pattern — especially teams building local‑first agents, LLM‑native operating environments, or researchy tooling for programmers.

Repo (with a tiny LLM‑wiki quickstart in the README) is here:
👉 https://github.com/harshitgavita-07/Aios

Happy to open issues / PRs to align this more tightly with the idea file if there are patterns you all think I’m missing.

@mesaydin-bot
Copy link
Copy Markdown

Thanks a lot. I added sth very useful for me. In parallel to wiki workflow llm makes seeds from raw, putting in seeds folder by formatting pre conditioned, if they grow enough llm move it to sprouts folder, then articles, chapters and book. Each transition has its own conditions. At the end i have a system beside creating wiki, growing articles, books organically. Thanks again.

@devsarangi2
Copy link
Copy Markdown

i think i came close to this same idea. but i hit one wall which implied that in practicality, my wiki of the same knowledge base does not carry the same value as another person. so the same 100 documents im ingesting in my repo holds a different value.
so for teams, its wasteage of compute.
my conclusion was why not ingest based on an already adjusted framework optimizing for read but for my primary perspective. a pm in a team doesnt need to know 80% of the technical documentation except for maybe action items related to governance. so 80% of the wiki for the pm about the whole document is useless to them but beneficial to the llm provider.
so llm wiki seems like a good for a personal knowledgebase but it seems so small of a usecase for an llm which can be used to do so much more.

however, my takeaway is that we can build on this to make it useful for at least a small team. a single llm can handle the collective knowledge and have consistent i formation, generating context dynamically for the user with a definite role. id be happy to share what i have so far.

@paulmchen
Copy link
Copy Markdown

Hi everyone — really appreciate this thread. Andrej's original sketch captures something most RAG implementations miss: knowledge should compound, not evaporate.

We've been building along exactly these lines, and today we're releasing Synthadoc Community Edition v0.1.0, a production-grade implementation of the LLM-wiki pattern, built for both personal use and enterprise multi-agent systems.

What it does

Synthadoc runs as a persistent background service. You point it at sources (files, URLs, web searches, PDFs, PPTX, XLSX, images) and it maintains a structured, cross-referenced Obsidian wiki that compounds over time — with ingest, query, and lint operations matching the architecture Andrej described.

How it differs from other implementations in this thread

Projects like OmegaWiki and obsidian-llm-wiki-local have done great work here, and our focus differs in a few ways:

  • Synthadoc isn't limited to ~100 notes and works with any LLM provider — Anthropic, OpenAI, Gemini, Groq, or a local Ollama instance. You choose.
  • It isn't domain-locked to research papers. Any domain — legal, medical, engineering, competitive intelligence — works out of the box. It ships with a domain scaffold generator that creates a category-structured index tailored to your knowledge area.
  • It ships with a Skills plugin architecture: URL fetching, PDF extraction (with pdfminer fallback), web search via Tavily, DOCX/XLSX/PPTX parsing, and image ingestion. Custom skills are a first-class extension point.
  • An async job queue with retry logic, deduplication, and an audit trail means large ingestion batches run reliably without babysitting.
  • A full Obsidian plugin — command palette, query modal with clickable wikilinks, job tracker, lint reports, web search — so the wiki is navigable without leaving your editor.

The enterprise angle

Synthadoc Community Edition is the open-source base of what Axoviq ships to industrial customers. In those deployments, Synthadoc acts as a domain-specific knowledge base within a larger multi-agent autonomous system — ingesting operational knowledge, protecting proprietary data behind a local service boundary, and exposing it to other agents (cloud or on-premise) through a clean HTTP API without ever externalising the raw corpus. This is a hard requirement in regulated industries where sensitive knowledge must remain air-gapped from public infrastructure. Self-improvement and knowledge integration across system components is a core design goal, not an afterthought.

We're actively looking to collaborate on

  • Custom skill implementations (new file types, data sources, APIs)
  • Hook integrations for automation pipelines
  • UI components for Obsidian or other editors including a Web UI
  • Search algorithm improvements (hybrid BM25 + embedding, ANN indexing)
  • Cloud and Kubernetes deployment patterns

If this aligns with what you're building, we'd love your feedback, contributions, or just a ⭐ if it's useful:

👉 Release v0.1.0: https://github.com/axoviq-ai/synthadoc/releases/tag/v0.1.0
Repo: https://github.com/axoviq-ai/synthadoc

Happy to answer questions about the architecture or how to extend it for your use case.

@nishchay7pixels
Copy link
Copy Markdown

👋 Andrej. Its something similar that I’ve been using myself. There is one problem to it. The knowledge stored could easily be corrupted and it will become impossible for user to figure that out. The more you’ll rely on Agent the more you will start doubting your own memory when served with corrupted responses by agent which are again because of corrupt data. Should we not have a way to secure it

@skyllwt
Copy link
Copy Markdown

skyllwt commented Apr 15, 2026

ΩmegaWiki is actively maintained and shipping fast:
• 23 Claude Code skills covering the full research lifecycle
• 9 typed entities · 9 typed edges
• Bilingual (EN + 中文)
• New skills landing every week

Come try it, give feedback, help us shape it 👇
https://github.com/skyllwt/OmegaWiki

截图 2026-04-14 08-55-39

Quick follow-up to ΩmegaWiki post — we just launched an Angel User Program 🎁

Free 15-day MiMo API credits. Drop the key into Claude Code and run the full LLM-Wiki loop you proposed — ingest papers, build a typed knowledge graph, generate ideas, draft papers, respond to reviewers.

End to end. One wiki. No chunks.

@nigelglenday
Copy link
Copy Markdown

Bravo for sharing this @karpathy. We're all coming to the same conclusions: 1) memory persistence is a problem; 2) flat markdown with string-match backlinks not cutting it. You need typed entities, typed relationships, and a traversable structure.

This is what I had in mind when I started putting Graphite Atlas (https://graphiteatlas.com) together last year.

The three-layer architecture maps like this:

  • Raw sources → Atlas ingests unstructured text (transcripts, SOPs, brain dumps) via LLM
  • The wiki → Instead of markdown pages, Atlas is a pre-typed property graph based on business process types out of the box today, extensible per use case.
  • The schema → A minimum viable ontology that maximizes expressiveness with the fewest primitives.

The three operations map to:

  • Ingest → Navigator AI extracts typed entities and relationships from unstructured text and adds them to the graph.
  • Query → Graph traversal, not keyword search. "Which concepts connect X to Y through two intermediate concepts?" is a Cypher query, not an LLM inference. Plus semantic search and natural language queries.
  • Lint → Graph analytics handle this natively: PageRank, community detection, centrality, orphan detection. The graph structure surfaces what flat files require an LLM to re-derive every time. Knowledge becomes explicit, natively visual. Not prompt and pray.

Atlas' UI vibe is a bit like Airtable + Miro + graph DB + LLM. Or think Mermaid strapped to a graph backend and persisted. The visual layer is valuable for human alignment and validation, but the real value is the entire structure is interpretable by an LLM. Atlas is multi-user, hosted, doesn't bog end-business users down with graph geek ontology hell.

I wear another hat leading finance and operations for a growth company where we are embedding AI as fast as possible and ran into twin problems that eventually led me to create Atlas:

(A) Documentation couldn't keep up with complex, interconnected operations;

(B) AI has terrible memory, made worse by (A), when it should be much more useful

So in Atlas, I model the business once (and compound it), use anywhere.

Atlas Screenshot

Two things that came up in the comments that Atlas also addresses:

@gnusupport's concern about "just a bunch of files" without LLMs. the graph has value independent of LLMs. It's a real database with a visual UI where humans validate and refine. The LLM helps build it, but the artifact stands on its own.

@devsarangi2's point about personal wikis not scaling to teams. I agree. Atlas is multi-user. Shared graphs, role-based access, collaborative modeling. A team builds and queries the same knowledge base.

The whole graph is also exposed via MCP, so any LLM client (Claude, ChatGPT, Cursor, local models) can query, create, and traverse.

Would genuinely love feedback from this group. If anyone wants to try it: https://graphiteatlas.com

@dusanick
Copy link
Copy Markdown

@gnusupport Hello, I can read you have an extensive experience in this field (that might or might not be good to relate with others and their needs). So, you have seen the use cases of the people here, one guy needs a knowledge base for his projects in Brasil, another guy has a library of articles for his work/studies. What alternative can you recommend for these people ( me as well), who want to achieve a specific goal (let´s say project), wo could not be bothered more about writing SQL queries to find out what John´s sister name is ? What tangible do you propose?

This is not meant to be a critique at all I agree with some of the points you mentioned and disagree with another ones. Pure interest on my side only. Thanks!

One additional point to general audience. I cannot decide if it is weird or sad how many people try to capitalize on an free idea to improve your live directly in this thread. My gratitude goes to those who share here freely like the author does..

@bradAGI
Copy link
Copy Markdown

bradAGI commented Apr 15, 2026

We took this pattern and made it autonomous at agentwiki.org. 37 newsletters get ingested every day , Gemini extracts structure, Haiku writes DokuWiki pages, and embedding-based cosine similarity handles the "is this the same concept?" problem across articles.

The compounding effect you describe is real and measurable. After ~20 newsletter articles, new ingestions update 3-7 existing pages per article instead of just creating new ones. The knowledge graph gets denser without anyone touching it.

The "incredible new product" you mention at the end — we're working on it. Started with AI agent docs, now expanding to newsletters. The wiki maintains itself; humans just curate sources.

@mauceri
Copy link
Copy Markdown

mauceri commented Apr 15, 2026 via email

@gnusupport
Copy link
Copy Markdown

gnusupport commented Apr 15, 2026

@dusanick

@gnusupport Hello, I can read you have an extensive experience in this field (that might or might not be good to relate with others and their needs).

Здраво Душане,

I am from old school, when PC was delivered to people with the GW-BASIC book, and where I learned that every company in our city at the time basically had to make their own software. I was watching accountants through window late night trying to make their own invoicing systems, and so they did. And I think that direction, I really thing every individual should learn programming to enhance his life.

So, you have seen the use cases of the people here, one guy needs a knowledge base for his projects in Brasil, another guy has a library of articles for his work/studies. What alternative can you recommend for these people ( me as well), who want to achieve a specific goal (let´s say project), wo could not be bothered more about writing SQL queries to find out what John´s sister name is ? What tangible do you propose?

The 🐑 "idea" here is not the knowledge base itself — it's avoiding the work of building one. But without a real backend, that shortcut becomes a trap. 🐑💀

LLM-Wiki fails because markdown is not a database. No foreign keys = broken links. No schema = duplicate chaos. No permissions = privacy leaks. Works at 100 files. Dies at 10,000. 🐑💀

Even LLM-Wiki projects quietly rely on databases — because markdown alone collapses without SQL underneath. 🐑💾

The tangibles already exist — proven systems like SiYuan, Trilium, and LocalKB that aren't entangled in the architectural trainwreck of blind, authority-following, vibe-coded agent slop. 🐑💀🧙 Personally I have used some knowledge bases back in time, something like Owl Document Management, and then I found GeDaFe, Generic Database Interface, and from that one I kept developing into the The Dynamic Knowledge Repository. Yes, I am tied to my knowledge base, though I could reconstruct it within weeks even if I would lose all the files.

Бат лет ми си, some examples from other people that you could use efficiently:

TriliumNext/Trilium: Build your personal knowledge base with Trilium Notes:
https://github.com/TriliumNext/Trilium

Real project with 300+ contributors, unlike those empty, soulless LLM-WIKIs solely made to lick нечију гузицу.

dacuotecuo/siyuan: A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.:
https://github.com/dacuotecuo/siyuan

Tiki Wiki CMS Groupware download | SourceForge.net:
https://sourceforge.net/projects/tikiwiki/

English:
https://foswiki.org/Community/English?cover=print

marvellousz/memora: A locally run full-stack application for building a personal knowledge base with semantic search and question answering capabilities using completely free and open-source AI tools.:
https://github.com/marvellousz/memora

githubkusi/awesome-knowledge-management-tools:
https://github.com/githubkusi/awesome-knowledge-management-tools

The very concept of 'Wiki' is contradictory to the LLM-Wiki concept—there is no coherence here. The author has incited people to spend their tokens generating soulless software built on deep architectural flaws. It's a good thing that some have already reported scalability issues; they tend to fall back on RAG anyway, even though LLM-Wiki was intended to be non-RAG. There are simply too many issues.

If an LLM-Wiki is "non-RAG," the user doesn't "ask" in the traditional ChatGPT sense. Instead, the interaction model shifts from conversational retrieval to navigational exploration.

Yet who is deeply and loud really thinking about it? They start coding in automated mode and they find out that RAG is actually the current solution (future will bring different solutions) -- so they end up making LLM-Wiki-Sheep based on architecture that wasn't planned, is not subject of the thread. Only once I git-pull their software I could see what is the code, and discover those išjus.

Look at this piece of the art:

Cherry Studio - The All-in-One AI Assistant:
https://www.cherry-ai.com/

It has knowledge base built-in:

  • drag files of all types;
  • save notes, save LLM outputs as notes;
  • add directory
  • add URLs
  • add website sitemap.xml

So you're making a wiki nobody edits, in Markdown that's just HTML with extra steps, so the LLM can talk to itself? Why not skip the clown show and generate straight HTML? 🤡🐑 Čerry Studio doesn't generate Wiki, but it has those tools, so it could generate it, just I see no point to have soulless Wiki without users editing it around.

One additional point to general audience. I cannot decide if it is weird or sad how many people try to capitalize on an free idea to improve your live directly in this thread. My gratitude goes to those who share here freely like the author does..

Code agents are great Душан, bat, not for non-programmers, we get nonsense outputs and empty projects вич nobody wish to support. Knowledge bases are there, in existence, before whoever OpenAI and Karpathies and Altmans.

You say they try to capitalize? Does it mean to make money? I just don't see how they are improving my life or anybody's life, as I see no users and no developers of that generated software, it is even hard to call them authors this way. The output of coding agents is normally public domain, most of them are not coded with hands.

What they did, they took the text of this gist and put it in their coding agent, and then "one day later" you can see something hype marvelous, they talk it is great, while there is no soul there. And people not follow. ANd sole reason of making it was to get some attention.

The real knowledge bases solves practical human problems:

  • some person needs information, some people are consultants, some make busines selling things, information is key there;

  • let's say Душан asks for knowledge base, I really don't remember the link, but I remember "Cherry" and so I search by name, and get 123188, so I press just "w w" and link is copied, and I put it in the text for him;

  • some people have companies, there are memorandum and articles, shareholder changes, while this could be converted to Markdown, it would be useless, yes, I can get embeddings, and search that way, but is totally not necessary, I search by company name, or person related, find all documents related and go into the set/category for legal documents and then with a click can share that document to a peron;

I would not be able to share soulles LLM-generated piece of ш...wiki, automatically and blindly converted from markdown, to wiki, to HTML, to represent the original document of memorandum and articles, and yet -- there would be very private note related to that memorandum whič could be wiki, but then again, what if that LLM-loop start sharing or mixing things in weird way? My note is my note. Throwing my notes into folder and let LLM-loop generate wiki, make no sense to me. Wiki is wiki - editable set of database files that expand into HTML. Wiki isn't markdown, though I believe some markdowns exist runnning some wikies. But this saga of LLM-Wiki is incoherent to its' own ideas.

Let's say you have some specific subject, and you expand it by using LLM into website page, not Wiki -- and you get navigation and some Javascript searchable index similar like ReadTheDocs -- that makes sense to me.

Why Wiki? If nobody is to edit it. LLM is supposed to generate Wiki. Why would LLM generate Wiki, why not HTML straight?

If human is to curate, human can just tell to LLM to curate HTML, why Wiki in first place?

The instruction literally says: "Your LLM can figure out the rest." So people take this vague, hand-wavy pattern, treat it like gospel, and start building "wikis" that no human will ever edit — because the LLM writes everything, and the LLM reads everything.

And for majority of those projects, no human will use it.

Ding ding ding! 🎯

That's the real meta-joke here. Karpathy isn't just vibe-coding software anymore — he's vibe-coding people.

He drops a half-baked idea file, says "figure it out yourself," and thousands of sheep start building markdown wikis without asking a single "why."

No thoughtful architecture, just his "vibes" - and people are treating it like a technical blueprint. 🤡🐑

That's not open source. That's open season on critical thinking.

Meanwhile, Engelbart's actual OHS framework — with global addresses, back-links, permissions, version stamps, and object-level referential integrity — sits there, fully specified, completely ignored. Because it doesn't come with a hype train. 🧙📜

@joshwand
Copy link
Copy Markdown

Like many of you, I've also experimented with LLM-maintained knowledge bases. For me it's been primarily around documenting codebases and selecting from it to provide context to coding agents. My observations:

  • The problem with voluminous LLM output, even with human review, is that eventually the data and your mental model drift and diverge. Then you and the LLM are speaking different languages, and you no longer have an understanding of the beast you've created.
  • Housekeeping—keeping the data internally consistent and up-to-date with reality—is an unavoidable ongoing chore. It's garbage collection for your external brain. Pruning stale data, updating indexes and summaries, comparing with ground truth. In coding projects I tend to do this consolidation after a large feature is completed. It's a little like how your memories are consolidated as you sleep and dream (kind of like the unreleased "dream mode" in claude code).
  • The knowledge schema, too, must be consistent with your mental model. The Cline Memory Bank schema, or my variation of it, anyways, has been good for me for almost two years now, as it aligns with my aspect-based approach to systems thinking (going back to RUP Views!). For other domains, though, I have picked different schemas, and evolved them as my understanding evolves.
  • To butcher Marshall McLuhan, "The metamodel is the message." The schema you pick (or a lack of schema, or an emergent one) will have a huge influence on what content gets stored, and whether it's fit-for your purpose.

Underlying all of this is an assumption that these LLMs and knowledge stores exist to serve human purposes. Therefore, the interface must also be human-comprehensible, and to be effective, follow sound UX principles like: the interface should align with the user's mental model; effectively manage cognitive load; progressive disclosure; discovery, etc.

@joshwand
Copy link
Copy Markdown

joshwand commented Apr 15, 2026

As the author of the ELIZA comment earlier, and to answer a reply from far upthread, I almost never have an LLM polish my writing. Having an idea isn't enough if you can't effectively communicate it, and the effort of thinking and writing refines the idea itself.

(I also don't really buy the non-English-speaking excuse; patterns of logic and rhetoric are mostly universal.)

When all writing sounds the same, and uses the same LLM-flavored patterns of phrasing and structure, it raises a few problems:

  1. RLHF sycophancy makes everything sound bombastic and groundbreaking. I think the judgement of the value of an idea should be reserved for third parties posessing sufficient expertise and context to actually evaluate claims. "When everyone is above average..." It feels a little like when as a kid I'd turn on the TV late at night, and it'd just be wall-to-wall infomercials.
  2. It used to be that polish was a good first-pass proxy for quality. If a software package had especially well-written documentation, it meant that someone had put real thought into the entire project as well, and not just slapped something together. Now that everything can have a base level of polish (with emoji!), one has to seek quality signals elsewhere. It's the prose equivalent of designing your website with Bootstrap, or using a Microsoft Word template for your resume—at first it looks amazing, until you see the pattern for the thousandth time and it comes to signify the opposite--a low-effort shortcut that indicates a lack of skill or sophistication.

@iBlinkQ
Copy link
Copy Markdown

iBlinkQ commented Apr 15, 2026

Building this knowledge graph is indeed very cool — but let me pour cold water on it and give you three pieces of advice:

  1. Raw resources may be better than LLM Wiki for beginners
    YouTube videos and PDFs are tutorials that go from shallow to deep, with the authors explaining things step by step. If you’re starting from zero, patiently working through the original materials is the most efficient approach. Once you have a complete understanding of the source materials, then consult the Wiki to find connections — that’s the scenario it’s really suited for: review and summary, not getting started.
image
  1. AI-generated content must be validated; don’t hoard it blindly
    Hoarding without reviewing is like hiring a robot to work out for you — it runs on the treadmill every day, but your body won’t get healthier. You need to find problems during acceptance and continuously refine the schema together with the AI for the generated content to truly guide decisions.
image
  1. The content you create is not just for you to read, it’s also for the AI
    The index and log in Karpathy’s system were designed for AI to read. I also add fields like type and summary to my notes — the former distinguishes what I wrote from what the AI generated; the latter makes it easier for the AI to retrieve. More and more routine maintenance work will be handed over to AI in the future, and these fields are its entry points to understand your knowledge base.

@mauceri
Copy link
Copy Markdown

mauceri commented Apr 15, 2026 via email

@gnusupport
Copy link
Copy Markdown

@mauceri

@.*** * You almost never use LLMs to polish your writing (the key word here is "almost")—good for you—but please accept that others don’t share that moral standard.

While token-rich and GPU-rich individuals are still a rare minority, we find many people using LLMs here, whereas in many free software communities, they are not used at all.

However, one aspect is surely interesting: A post generated by a self-declared author who claims to have "AI psychosis," doing nothing but talking to an LLM for two-thirds of the day. This author then "vibe-coded" people into adding that post to their "vibe-coding" agent. Yet, we have critics complaining that comments are coming from LLMs. This feels somewhat hypocritical.

Comments are expected to be reported by the LLM (ironically), since humans were never expected to report back anyway (exaggerating); the Sole Author is likely too busy laughing behind his kitchen bar to chime in, and frankly, is he even reporting? Or giving any feedback to people? No. 🐑🐑🐑 doesn't even know where is the shepherd. I can even see YouTube videos appearing on this subject.

Look at the hype:
https://www.youtube.com/results?search_query=karpathy+llm+wiki

For every word spoken here and elsewhere, a far superior alternative exists at https://Felo.ai. It is baffling that so many choose to degrade their discourse by clinging to petty, small-scale limitations rather than embracing the obvious real world solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment