Andrej Karpathy Introduces LLM Knowledge Bases, a New Paradigm Beyond RAG

AI Bot
By AI Bot ·

Loading the Text to Speech Audio Player...

Andrej Karpathy, former head of AI at Tesla and co-founder of OpenAI, has introduced a new workflow called "LLM Knowledge Bases" that is rapidly gaining traction in the developer community. The concept proposes using large language models not just to answer questions, but to incrementally build and maintain structured personal wikis — a persistent, compounding knowledge artifact that grows smarter over time.

Key Highlights

  • LLMs build and maintain a structured markdown wiki from raw source documents
  • The approach bypasses traditional RAG by creating a persistent, evolving knowledge layer
  • Karpathy's personal wiki already contains around 100 articles and over 400,000 words
  • The system uses Obsidian as a frontend and git for version control

A Three-Layer Architecture

The system is built around three distinct layers. At the bottom sits a raw sources directory containing immutable curated documents — articles, research papers, images, and data files that serve as the source of truth.

Above that lives the wiki itself: a collection of LLM-generated markdown files organized by entities, concepts, summaries, and cross-references. The LLM owns this layer entirely, creating and updating pages as new sources arrive.

At the top is the schema, a configuration document (similar to a CLAUDE.md file) that defines the wiki structure, naming conventions, and operational workflows.

How It Works

The workflow revolves around three core operations:

Ingest — When new source materials land in the raw directory, the LLM processes them, extracts key information, updates existing pages, and integrates findings into the evolving synthesis.

Query — Users ask questions against the wiki, with the LLM synthesizing answers from relevant pages. Results can optionally be filed back as new wiki pages, meaning that answers compound over time rather than disappearing into chat history.

Lint — Periodic health checks identify contradictions, stale claims, orphan pages, and missing connections, ensuring the wiki maintains data integrity as it grows.

Why This Matters Beyond RAG

Traditional RAG systems retrieve raw documents on each query, re-deriving context every time. Karpathy's approach flips this model: the LLM effectively pre-processes and synthesizes knowledge into a persistent layer that grows richer with each interaction.

As Karpathy noted, a large fraction of his recent token throughput is going less into manipulating code and more into manipulating knowledge. The human curates sources and asks questions while the LLM handles the heavy lifting of summarizing, cross-referencing, and maintenance — tasks that typically cause knowledge systems to fail from neglect.

The Tool Stack

The system relies on Obsidian as the IDE frontend for browsing and editing the wiki. The Obsidian Web Clipper extension converts web articles into clean markdown files. Dataview queries frontmatter metadata, and git provides natural version control.

A growing ecosystem of implementations has already emerged, including Sage Wiki, Binder, and specialized variants for trading research, academic study, and voice-first knowledge capture.

Impact on Developers and Researchers

The post has generated over 30,000 engagements on X and sparked a wave of open-source implementations. Developers are particularly drawn to the idea that knowledge compounds rather than decays — each research session leaves behind structured artifacts that future queries can build upon.

For teams and solo researchers alike, the approach offers a practical middle ground between expensive enterprise knowledge management systems and the ephemeral nature of chat-based AI interactions.

What Comes Next

As LLM context windows continue to grow and agent capabilities improve, the wiki-as-knowledge-base pattern could become a standard workflow for anyone who works with large volumes of information. The approach is model-agnostic and works with any LLM capable of following structured instructions, making it accessible to a wide range of users.


Source: Andrej Karpathy on GitHub


Want to read more news? Check out our latest news article on Anthropic Launches Remote Control for Claude Code: Manage Coding Sessions From Your Phone.

Discuss Your Project with Us

We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.

Let's find the best solutions for your needs.