Memory Layers ·Apr 4, 2026 ·10 min read

How Karpathy's LLM Wiki Changes How Engineers Think About AI Memory

The compiler analogy that inverts RAG, the four operations that make a knowledge base compound over time, and what it means for organizational-scale AI memory.

TL;DR

In April 2026, Andrej Karpathy, co-founder of OpenAI, former Tesla AI Director, now founder of Eureka Labs, published a GitHub gist describing a fundamentally different approach to AI memory: the LLM Wiki. The insight is simple and, in retrospect, obvious, don't query raw documents at inference time. Compile them into a structured, maintained knowledge base first, then query that. This inverts how most RAG pipelines work and validates a principle that has been quietly driving the best memory system architectures of 2025–2026. Every serious AI memory builder had already converged on this pattern before Karpathy named it. His post named the convergence.

The problem Karpathy named before most engineers noticed it

Every developer who has worked with AI agents in production has hit the same wall. You build something brilliant with Claude or GPT. You close the session. The next day, the model has no idea what you built, why you built it, or what decisions you made along the way.

The standard answer has been RAG: chunk documents, embed them, store in a vector database, retrieve the top-k chunks at query time. This works for document search. It does not work for knowledge that should compound over time.

RAG rereads the same books for every exam, never actually learning the material.

Andrej Karpathy, LLM Wiki gist, April 2026

The problem is not retrieval. The problem is that raw documents are like uncompiled source code, verbose, redundant, context-heavy, and not optimized for consumption. Every inference call forces the model to re-derive knowledge that should already be structured. Karpathy's compiler analogy: raw articles and papers are source code, the LLM is the compiler, and the wiki it produces is the compiled executable. You don't run source code. You compile it first.

The four operations of the LLM Wiki

Ingest, feed, read, weave

Feed a new document (paper, article, transcript, web page). The LLM reads it and weaves its insights into the existing wiki, potentially updating 10–15 existing pages simultaneously. New knowledge integrates immediately. Cross-references are created.

Query, ask from the compiled store, not the raw documents

Ask a question. The LLM answers from the wiki, not from raw documents. The key insight: good answers get filed back as new wiki pages. Explorations compound, each question and answer makes the wiki more comprehensive.

Lint, periodic health checks

Find contradictions between wiki pages, identify stale claims, detect orphan pages, spot missing cross-references. The LLM identifies what to investigate next. This is the operation that traditional wikis fail at: humans abandon maintenance because the bookkeeping cost grows faster than the value. LLMs do not get bored.

Fine-tune, bake knowledge in

Karpathy's stated next step: use the compiled wiki as fine-tuning data. Train the model so the knowledge is baked in, not retrieved at runtime. This is the difference between an AI that looks things up versus an AI that actually knows.

The Obsidian setup: IDE for your AI brain

Karpathy uses Obsidian as the viewer. His description: "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase." His current research wiki is approximately 100 articles at ~400,000 words. He rarely edits wiki files himself. The AI writes, updates, and maintains. He reads. The Karpathy golden rule: raw sources are sacred and never changed. The wiki is AI territory. When there is a contradiction between the wiki and a raw source, the raw source wins.

How the LLM Wiki addresses the six hard problems of AI memory

Problem	RAG approach	LLM Wiki approach
Relationship between facts	Near-miss chunk retrieval	Interlinked pages with explicit cross-references
Temporal awareness	No native concept of 'this was true then, not now'	Lint operation detects stale claims
Contradiction resolution	Returns both contradicting chunks with confidence	Lint + ingest resolve contradictions before query
Knowledge consolidation	Redundant chunks across sources	Synthesis during ingest removes redundancy
Maintenance burden	Scales with corpus; humans abandon	LLM maintenance cost near-zero
Proactive insight	Only responds to queries	Not solved, still reactive

That last row is the honest gap. The LLM Wiki is an excellent compiled knowledge store. It is still fundamentally query-driven. It does not notice that a critical fact changed and push that observation to agents proactively. For the architecture that closes this gap, see Why the Context Graph Is the Future of AI Memory.

The movement Karpathy sparked

The LLM Wiki idea landed in April 2026 and immediately generated several concrete implementations:

MemPalace, built on the verbatim storage variant of this principle; 41,200+ GitHub stars by week 2. See the full breakdown.
Graphify, a knowledge graph that applies Karpathy's pattern to codebases; delivers 71.5x fewer tokens per query vs reading raw files directly.
Mirror Memory, managed memory service built on the compiled-wiki principle for cross-device, cross-tool persistence.
The r/LocalLLaMA community: hundreds of custom implementations within weeks of the gist publishing.

The GitHub stars count across these implementations is not the relevant metric. The relevant signal is that the industry's best memory architects, Mem0, Zep, Cognee, Supermemory, had already converged on exactly this pattern before Karpathy named it. His post named the convergence. For the full comparison of open-source memory layers, see Open-Source Memory Layers 2026.

Why personal memory is the wrong frame, and organizational memory is the right one

The LLM Wiki is a personal system. One person, one corpus, one knowledge base. This is the right starting point. But the production problem is organizational. A company does not have one conversation history. It has thousands of conversations, documents, decisions, relationships, customer interactions, and process changes happening simultaneously across multiple people and multiple agents.

The organizational version of the LLM Wiki requires entity extraction at ingestion, an explicit relationship graph, temporal validity tracking (not just what is true, but what was true when), multi-agent concurrent access with conflict resolution, and proactive change detection, noticing when a new signal contradicts an existing fact and pushing that update without being asked.

GeniOS as the organizational LLM Wiki

GeniOS's Context Graph (Section A) is the production-grade version of Karpathy's compiled wiki: entity extraction at ingestion, interlinked relationships, temporal validity windows, and 5-axis scoring (confidence, freshness, consistency, signal, authority), the production-grade version of the lint operation. Context Intelligence (Section B) is the proactive reasoning step the LLM Wiki does not have: it runs continuously, detects change, and pushes recommendations to agents without a query trigger.

What is Karpathy's LLM Wiki?

A pattern for building AI memory where raw sources are compiled into a structured, interlinked wiki by an LLM before any query. The wiki, not the raw documents, is what the AI uses at inference time. This contrasts with RAG, which retrieves raw chunks at query time.

Why is the LLM Wiki better than RAG for personal knowledge management?

RAG retrieves raw documents on every query, knowledge never compounds. The LLM Wiki compiles knowledge once, maintains it continuously, and allows the AI to query pre-synthesized, cross-referenced knowledge. For research, project history, and personal context, the wiki approach delivers more useful answers.

Who is Andrej Karpathy?

Andrej Karpathy is a machine learning researcher, co-founding member of OpenAI, former Director of AI at Tesla Autopilot, and founder of Eureka Labs. He coined 'context engineering' as the accurate term for what serious production AI systems require.

What is context engineering according to Karpathy?

Karpathy's definition: 'Context engineering is the delicate art and science of filling the context window with just the right information for the next step.' It is the complete design of the information environment the model sees, memory, retrieved documents, tool outputs, examples, instructions, not just writing a good prompt.

What is Karpathy's cognitive core concept?

Karpathy argues that current LLMs are burdened with two jobs: being intelligent and memorizing the internet. He advocates separating these, offloading knowledge storage to an external memory system (the wiki or graph) so the model can focus on reasoning. This predicts smaller, more capable reasoning models paired with external persistent knowledge stores.

Book a call More writing