Andrej Karpathy is experimenting with a different way to give large language models durable memory: not with vector databases and classic RAG stacks, but with a living Markdown wiki that the model itself maintains. For AI engineers and architects wrestling with context limits, hallucinations, and opaque retrieval pipelines, his “LLM Knowledge Bases” pattern offers a concrete, file-first alternative for building autonomous archives.
From stateless chats to a persistent, AI-maintained memory
Most LLM workflows today are effectively stateless. You open a session, “vibe code” with the model, build up detailed context… and then hit a context limit or session timeout. The next day, you’re reconstructing the entire mental model from scratch, burning tokens and time to re-explain what the AI already “knew.”
Karpathy’s approach attacks that reset directly. Instead of treating the model as a disposable chat partner, he treats it as a long-running research librarian that continuously writes and maintains a set of Markdown (.md) files. These files become the persistent memory of ongoing projects and research areas.
The key shift is where the tokens go. Rather than spending most of the budget on repeated explanation and boilerplate code, Karpathy routes a significant portion of token throughput into manipulating structured knowledge: summarizing sources, organizing them into a wiki, and keeping that wiki healthy over time. The result is a “second brain” that is self-healing, auditable, and fully human-readable.
How the LLM Knowledge Base works: ingest, compile, lint
Karpathy’s pattern replaces typical RAG plumbing with a three-stage workflow built entirely on files:
-
Data ingest to
raw/: All upstream material—research papers, GitHub repos, datasets, web articles—is dumped into araw/directory. For web content, Karpathy uses the Obsidian Web Clipper to capture pages as Markdown, including local copies of images so vision-capable models can reference them. At this step, nothing is “smart” yet; it’s just disciplined capture into a consistent, LLM-friendly format. -
Compilation into a structured wiki: Instead of merely indexing documents, the LLM compiles them. It reads
raw/files and writes structured wiki pages: summaries of sources, encyclopedia-style articles for core concepts, and explicit backlinks between related ideas. The model is not just a retriever; it is the primary author and editor of the knowledge base. -
Active maintenance via linting passes: The wiki is continuously checked. Karpathy runs “health checks” or “linting” passes where the LLM scans for inconsistencies, missing links, outdated information, or places where new connections can be made. Community observers have described this as a living knowledge base that “heals itself” as the model re-reads and refactors its own output over time.
Throughout this process, Markdown files are the single source of truth. Every claim in the knowledge base traces to a specific file that humans can open, read, edit, or delete. There is no opaque embedding index to reverse-engineer, no hidden state that only the retrieval layer “understands.”
Why Markdown instead of RAG? Tradeoffs for mid-sized corpora
For the last several years, Retrieval-Augmented Generation has been the default answer to the “how do I give my LLM my data?” problem. The usual recipe: chunk documents, embed them into vector space, store them in a specialized database, and run similarity search at query time.
Karpathy’s LLM Knowledge Bases explicitly sidestep this stack for mid-sized corpora. The bet is that modern models are strong enough at reasoning over structured text that, up to a certain scale, you can drop vectors entirely and still get better usability and observability.
In this pattern:
-
Data format: Knowledge is stored as human-readable Markdown instead of opaque vectors.
-
Logic: Relationships are represented as explicit indices and backlinks, not just emergent similarity in embedding space.
-
Auditability: Every answer can be traced to specific files, making the system transparent and debuggable.
-
Compounding: The knowledge base actively evolves through linting, rather than being a static collection that must be periodically re-indexed.
-
Scale target: The sweet spot is on the order of hundreds to perhaps tens of thousands of high-signal documents—not millions of heterogeneous records.
The contrast is often summarized as warehouse versus library: a vector DB-based RAG system is like a massive warehouse with a very fast forklift, great for locating anything but weak on structural explanation. The Markdown wiki is a curated library with a head librarian constantly writing new books to describe and connect the old ones.
From personal second brain to company “compiled knowledge”

While Karpathy describes his implementation as a “hacky collection of scripts,” entrepreneurs and practitioners quickly extrapolated the enterprise angle. The observation is simple and blunt: every business already has a raw/ directory in everything but name—Slack history, internal wikis, tickets, PDFs, slide decks, and logs. Almost none of it is actually compiled.
A “Karpathy-style” enterprise layer would not just search these artifacts; it would continuously author and update a “company bible”: a structured, AI-maintained manual of systems, decisions, and concepts, derived from the messy stream of daily operations. This aligns closely with how knowledge actually accretes in organizations but is rarely captured in a maintainable way.
Commentary from the community underscores both the opportunity and the difficulty. On the one hand, packaging this pattern behind a user-friendly app that quietly syncs with existing tools—bookmarks, read-later apps, saved threads, and more—looks like a sizable product opportunity. On the other, scaling from a single researcher’s wiki to an entire company’s operational memory introduces classic enterprise challenges: contradictions between teams, conflicting tribal knowledge, and millions of records spanning years.
Some teams are already trying to bridge that gap with multi-agent architectures. One proposed “Swarm Knowledge Base” design scales Karpathy’s workflow to a 10-agent system orchestrated via a control layer, adding supervisory models to protect the shared wiki from compounding hallucinations. Here, an evaluation-focused LLM (used as a “quality gate”) scores and validates draft pages before they enter the live knowledge base, creating a compound loop where raw outputs are refined, checked, and then fed back as high-integrity briefings to the agents.
Scaling limits and where this beats “fancy RAG”
A common critique of file-first, non-vector approaches is scalability. How far can you push a Markdown wiki before retrieval becomes too slow or too noisy?
Karpathy reports using this architecture at a scale of about 100 articles and roughly 400,000 words. At this size, the model’s ability to navigate via summaries and index pages appears sufficient; the overhead and complexity of a full RAG stack would likely introduce more latency and retrieval noise than it removes.
This makes the pattern particularly attractive for personal research, advanced side projects, and departmental knowledge bases: scenarios where the document count is modest, but the need for traceability and conceptual structure is high. Other practitioners, such as Lex Fridman, report variations on the theme: generating dynamic HTML dashboards for interactive visualization, or spinning up temporary, focused mini-knowledge-bases for narrow tasks (for example, a long run with a voice interface into a task-specific wiki).
These “ephemeral wikis” point toward a future where users don’t just ask questions of a monolithic model; they spawn short-lived research environments, have agents curate and compile relevant knowledge, and then discard those environments once a decision or report is complete—leaving the long-term archive slimmer and more curated.
File-over-app: Obsidian, Markdown, and data sovereignty

Underneath the architecture is a clear philosophy: files outlast apps. Karpathy’s stack is deliberately built on Markdown as an open, tool-agnostic standard, with Obsidian as the primary interface for browsing and linking notes.
Markdown ensures portability. If Obsidian disappears or licensing conditions change, the knowledge base remains a directory of plain-text files that any editor can open. Obsidian, in turn, offers a local-first, folder-based approach that fits the idea of an AI librarian “visiting” a vault of files rather than owning the data.
On top of this, Karpathy uses custom “vibe-coded” scripts—CLI tools and search helpers—to bridge between the LLM and the local filesystem. These don’t attempt to be a full platform; they are glue logic, pushing work into the model (summarization, indexing, linting) and writing the results back to disk.
This stands in contrast to SaaS-heavy note systems such as Notion or cloud document suites where the knowledge base lives behind an API and a vendor’s permission model. In the file-over-app view, the LLM is an advanced editor operating on user-owned files, not a central platform that must be integrated and trusted at every layer.
The community has also begun exploring hygiene patterns around this. One suggestion, from Obsidian’s co-creator, is to separate a clean personal vault from a “messy vault” used by agents, only promoting distilled insights into the trusted archive. This “contamination mitigation” pattern mirrors production data staging in analytics: give the AI a sandbox to explore and write freely, and treat promotion into the core vault as a controlled step.
Choosing between RAG and a Markdown-first knowledge base

For teams deciding how to structure their next knowledge system, the choice is less about ideology and more about fit:
-
When vector DB / RAG fits best: Massive, heterogeneous corpora (millions of documents) where semantic similarity search is the main requirement; search latency and recall matter more than human interpretability; the system is primarily read-only, with infrequent structural refactoring.
-
When a Karpathy-style Markdown wiki fits best: Mid-sized, high-signal knowledge bases (hundreds to perhaps tens of thousands of documents) where structure, traceability, and ongoing synthesis are central. Here, you want the system to continuously rewrite and compress its own understanding, not just fetch relevant snippets.
In effect, RAG acts like a fast pointer into a disorganized pile, while an LLM-maintained wiki aims to become a living model of the domain itself—expressed in text, not tensors. For many AI engineers and power users, especially those building internal tools or research workflows, the second option may be both simpler to build and easier to reason about.
Autonomous archives and the path to fine-tuned models
The long-term implication of Karpathy’s pattern is not just better context management, but higher-quality training data. As the wiki is repeatedly linted and refined by the LLM, it becomes a progressively “cleaner” representation of the domain: deduplicated, cross-linked, and written in a consistent style.
At that point, the knowledge base is more than something to stuff into a context window. It becomes a candidate training set. Instead of continually prompting a large general model with the wiki, a team could fine-tune a smaller, more efficient model on that curated corpus—encoding the knowledge base into the model’s weights and turning a personal or departmental archive into a private, domain-specialized intelligence.
For now, Karpathy’s public contribution is architectural and philosophical rather than a packaged product: treat the LLM not as a stateless chatbot, but as an ongoing collaborator that maintains its own memory in your filesystem. For practitioners building LLM-powered systems, that shift opens up a spectrum of designs where autonomous archives—kept in plain-text Markdown—can replace much of the complexity traditionally delegated to black-box retrieval stacks.

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.





