Artificial Intelligence 18 min read

How Karpathy’s LLM‑Wiki Turns LLMs into a Self‑Growing Personal Knowledge Base

The article critiques traditional RAG‑based knowledge bases for lacking persistence, then details Karpathy’s LLM‑wiki approach that incrementally builds a structured, cross‑linked Markdown wiki through three layers, three core operations, and lightweight indexing, enabling continuous, low‑cost knowledge accumulation.

AI Cyberspace

Apr 28, 2026

How Karpathy’s LLM‑Wiki Turns LLMs into a Self‑Growing Personal Knowledge Base

Problems with Traditional RAG

Current mainstream knowledge bases such as ChatGPT file upload and NotebookLM rely on Retrieval‑Augmented Generation (RAG): upload documents, ask questions, and the LLM retrieves relevant fragments to generate answers. Karpathy argues this approach has a fatal flaw—no accumulation. Each query forces the LLM to re‑search and re‑assemble knowledge from raw documents, leaving no lasting record.

Karpathy LLM‑Wiki Idea

The core idea is to treat the LLM not as a search engine but as a programmer maintaining a Markdown wiki—a structured, inter‑linked collection of files that persists and compounds over time. When a new source is added, the LLM reads it, extracts key points, and integrates them into the existing wiki by updating entity pages, revising topic summaries, and flagging contradictions.

Three‑Layer Architecture

Raw Sources – Immutable collection of original papers, articles, images, and data files. The LLM only reads this layer.

The Wiki – LLM‑generated Markdown directory containing summaries, entity pages, concept pages, comparative analyses, and overviews. The LLM owns and writes this layer.

The Schema – Rule files that define the wiki’s organization, conventions, and workflows (e.g., CLAUDE.md for Claude Code, AGENTS.md for Codex). This configuration guides the LLM’s disciplined maintenance.

Three Core Operations

Ingest – Add a new raw file, let the LLM read it, discuss key takeaways, write a summary page, update the index, and modify related entity and concept pages. One source may affect 10‑15 wiki pages. Karpathy prefers a step‑by‑step ingest with human guidance.

Query – Pose questions to the wiki. The LLM searches relevant pages, synthesizes a cited answer, and can output in various formats such as Markdown pages, comparison tables, Marp slides, or matplotlib charts. High‑quality answers can be stored back as new wiki pages, creating a compounding knowledge asset.

Lint – Periodic health checks where the LLM scans the wiki for contradictions, outdated statements, orphan pages, missing cross‑references, and suggests new research directions, ensuring the wiki remains coherent as it grows.

Index and Log Files

index.md is a content‑oriented directory listing every wiki page with a link, a one‑sentence summary, and optional metadata (date, source count). It is updated on each ingest and enables efficient navigation without a vector‑based RAG infrastructure, performing well for ~100 sources and hundreds of pages.

log.md is a time‑oriented append‑only record of all operations (ingest, query, lint). A typical entry looks like ## [2026-04-02] ingest | Article Title. Simple Unix tools can extract recent activity, e.g., grep "^## \[" log.md | tail -5, providing a chronological view of wiki evolution.

Practical Construction

Install Claude Code and Obsidian, create a new directory, and let Claude Code automatically compile raw files into the wiki. While the LLM updates the wiki in real time, the user browses results in Obsidian, using the graph view to follow links and see updates.

Karpathy likens the setup to: Obsidian is the IDE, the LLM is the programmer, and the wiki is the codebase.

Why This Works

The most labor‑intensive part of maintaining a knowledge base is not reading or thinking but the ongoing upkeep—updating cross‑references, keeping summaries current, and reconciling new data with old conclusions. Human‑maintained wikis often become too costly to sustain; the LLM can handle up to 15 page updates per ingest, driving maintenance cost toward zero.

Human effort is limited to source selection, guiding analysis, asking good questions, and interpreting results. The LLM performs the rest.

The concept echoes Vannevar Bush’s 1945 Memex vision of a personal, linked information system, with the LLM solving the historic maintenance problem.

Optional CLI Tools

qmd – a local Markdown search engine supporting BM25/vector hybrid search and LLM re‑ranking, useful when the wiki outgrows simple file indexing.

Obsidian Web Clipper – converts web articles to Markdown for quick ingestion.

Local image download settings – configure Obsidian to store attachments in a fixed folder (e.g., raw/assets/) and bind a shortcut for bulk downloading.

Obsidian’s graph view – visualizes the overall wiki structure.

Marp – Markdown‑based slide format for turning wiki content into presentations.

Dataview plugin – queries YAML front‑matter added by the LLM to generate dynamic tables and lists.

References

Original article URL: https://mp.weixin.qq.com/s/ueCIydLLACyqGP5SrAhpjQ

Gist with example prompts: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Agents LLM RAG knowledge management Markdown Obsidian personal wiki

Written by

AI Cyberspace

AI, big data, cloud computing, and networking.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.