Memory Layers ·Feb 26, 2026 ·8 min read

What Andrej Karpathy Is Saying About Memory, Context, and Why AI Agents Don’t Work Yet

Four quotes that reshaped the agent-infrastructure conversation, the LLM Wiki idea, and what context engineering actually means for builders in 2026.

TL;DR

Andrej Karpathy, OpenAI founding member and former Tesla Autopilot director, spent 2025-2026 arguing three things publicly: (1) this is the decade of agents, not the year; (2) prompt engineering is the wrong term - the real discipline is context engineering; and (3) the right mental model is LLM-as-CPU and context-window-as-RAM, which makes memory systems an operating-system problem, not a database problem. His June 2025 tweet endorsing "context engineering" over "prompt engineering" has 14K likes and reshaped how the market talks about agent infrastructure. His October 2025 interview with Dwarkesh Patel laid out why agents remain brittle: they cannot plan, they cannot remember, and their knowledge is pre-compiled instead of compounding. This post distills his position and what it means for anyone building agents in 2026.

Quote 1 - "It’s the decade of agents"

From Karpathy’s October 2025 Dwarkesh Patel interview:

The quote you’ve just mentioned, 'It’s the decade of agents,' is actually a reaction to a pre-existing quote. I was triggered by that because there’s some over-prediction going on in the industry. In my mind, this is more accurately described as the decade of agents. We have some very early agents that are extremely impressive and that I use daily - Claude and Codex and so on - but I still feel there’s so much work to be done.

Andrej Karpathy - Dwarkesh Patel interview, Oct 2025

The subtext matters. Karpathy is not dismissing agents. He is explicitly pushing back against the AI-agent hype cycle. He uses them daily. He also thinks the architecture is nowhere near where it needs to be for them to work reliably in production.

Quote 2 - "Context engineering is the delicate art and science"

On June 25, 2025, Karpathy posted on X:

+1 for 'context engineering' over 'prompt engineering'. People associate prompts with short task descriptions you’d give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step.

@karpathy - June 25, 2025

The tweet was a reply to Shopify CEO Tobi Lutke, who had written a day earlier: "I really like the term 'context engineering' over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM."

Within two weeks, LangChain published a long-form breakdown of "Context Engineering for Agents." Zep rebranded to a "context engineering platform." The vocabulary shifted across the memory-layer market.

Quote 3 - "LLMs are like a new kind of operating system"

Karpathy’s mental model for LLMs is the one most cited by serious builders:

LLMs are like a new kind of operating system. The LLM is like the CPU, and its context window is like the RAM - serving as the model’s working memory. Just like RAM, the LLM context window has limited capacity to handle various sources of context. And just as an operating system curates what fits into a CPU’s RAM, we can think about 'context engineering' playing a similar role.

LangChain Blog - Context Engineering for Agents

The implication: memory is the disk. Context is the RAM. An agent needs both, and the job of "context engineering" is the OS-level work of deciding what gets paged into the context window for each step.

Quote 4 - "If they had less knowledge, maybe they would be better"

From the same Dwarkesh interview, Karpathy’s most counterintuitive take:

I feel agents, one thing they’re not very good at, is going off the data manifold of what exists on the internet. If they had less knowledge or less memory, maybe they would be better. What I think we have to do going forward is figure out ways to remove some of the knowledge and keep what I call this cognitive core.

Andrej Karpathy - Dwarkesh Patel interview

This is important. Karpathy is not saying agents need bigger memory systems that store everything. He is saying the model needs a lean, intelligent core plus an external, structured memory that compounds. Memory for agents is not a bigger hard drive. It is a different architecture.

The LLM Wiki idea - Karpathy’s alternative to RAG

In October 2025, Karpathy published a GitHub Gist describing what he called the LLM Wiki pattern. Instead of using an LLM to retrieve raw chunks on every query (RAG), the LLM agent proactively compiles those documents into a persistent, interconnected knowledge base - a Wiki - that it queries later. The heavy cognitive work of understanding and structuring happens once, during ingestion. Retrieval becomes cheap.

Epsilla’s engineering team called this "a clear-eyed indictment of the entire RAG paradigm and a blueprint for what comes next." Their framing: stateless retrieval was the bridge; stateful, agent-maintained knowledge is the destination. (Epsilla)

What Karpathy is (and isn’t) saying about agent startups

Karpathy’s position is not that agents don’t matter. He uses them daily. His position is that the current architecture - stateless LLM plus reactive vector retrieval plus custom prompt glue - is structurally incapable of producing reliable agents. Fixing that requires three things he names explicitly:

01 Removing knowledge from the base model to force it to rely on external, structured context.
02 Making memory persistent and compounding, so the agent gets better over time instead of resetting every session.
03 Treating context engineering as the core discipline, not an afterthought glued on top of prompt crafting.

What this means for builders in 2026

If you are building an AI agent company, Karpathy’s framework gives you three concrete architectural bets:

Treat your memory layer as OS-level infrastructure, not a database call.
Proactively compile and structure knowledge at ingest time, not at query time.
Separate the reasoning engine (what thinks) from the knowledge graph (what knows).

For the tactical implementation of context engineering, see Context Engineering Tactical Playbook.

How GeniOS maps to this framework

The Context Graph (Section A) is the compiled, structured substrate - entities, relationships, confidence scores, temporal validity. The Context Intelligence layer (Section B) is the continuous reasoner - it notices change, connects dots across time, and pushes proactive recommendations to agents. The split matches Karpathy’s CPU/RAM/OS mental model directly: Section A is the structured memory, Section B is the operating-system-level reasoning loop over it.

What is context engineering?

Context engineering is the discipline of dynamically filling an LLM’s context window with the right information - instructions, retrieved facts, tools, memory, state - at each step of an agent’s trajectory. The term was popularized by Tobi Lutke and Andrej Karpathy in June 2025.

Why does Karpathy say it will take a decade for agents to work?

Because current agents lack planning, persistent memory, and multi-modal grounding - three architectural gaps that cannot be closed by scaling the base model alone.

What is the LLM Wiki pattern?

A pattern proposed by Karpathy in October 2025 where an LLM agent proactively compiles raw documents into a structured, persistent knowledge base (a "wiki") at ingest time, replacing stateless RAG retrieval with stateful, pre-compiled knowledge.

Did Karpathy coin "context engineering"?

The term was popularized simultaneously by Tobi Lutke (Shopify CEO) and Andrej Karpathy in June 2025. Karpathy gave it its definitional framing: "the delicate art and science of filling the context window with just the right information for the next step."

Book a call More writing