How Karpathy’s Markdown Wiki Redefines LLM Knowledge Management
The article examines the LLM Wiki concept introduced by Karpathy, explaining how a Markdown‑based wiki maintained outside the LLM context can persist and evolve model understanding, compares it with RAG, note‑taking tools and traditional knowledge bases, and outlines architectural components, risks, and practical guidelines.
Why LLM Wiki?
LLM workflows that repeatedly retrieve similar snippets and re‑summarise the same material produce useful answers but do not retain knowledge. As LLMs move from single‑question answering to long‑term project partners that read code, documents, meeting notes, papers, and track requirements, knowledge must be continuously organised, supplemented, and reused.
Karpathy’s Markdown Knowledge‑Base Model
Karpathy’s llm-wiki.md proposes keeping a Markdown wiki outside the model’s context window. Markdown is human‑readable, model‑friendly, can be version‑controlled with Git, and allows manual review. The key is a set of inter‑linked, hierarchically structured pages (entry, overview, source, entity, concept, query) rather than a single summary document.
The open‑source desktop implementation nashsu/llm_wiki materialises this idea by importing data, analysing content, generating wiki pages, and reusing them in subsequent queries.
Architecture Overview
Raw Sources
↓
Ingest / Knowledge Compile
↓
Markdown Wiki file tree
↓
Query / Update LoopRaw Sources
Raw Sources are immutable original materials—papers, webpages, PDFs, meeting notes, code repositories, specifications, etc. They serve as the factual ground truth; the wiki is merely an organised, compressed interpretation of them.
Ingest / Knowledge Compile
Implemented in nashsu/llm_wiki as two stages:
Stage 1 – Analysis: Read raw sources, extract entities, concepts, relationships, links to existing wiki pages, contradictions, and open questions. This determines how the material should enter the knowledge system.
Stage 2 – Generation: Based on the analysis, produce source summaries, entity pages, concept pages, index.md, log.md, and a list of items that require human review.
This split separates structural judgement from actual page writing, exposing gaps before committing to files. The process is nondeterministic—different runs may yield slightly different wiki structures.
Markdown File Tree
index.md– entry point and navigation index. log.md – history of imports, updates, and queries. overview.md – high‑level summary. sources/ – raw source copies or references. entities/ – pages for people, projects, organisations. concepts/ – pages for abstract ideas, methods, topics. queries/ – records of query processes, questions, answers, and intermediate results.
This structure turns knowledge into a set of editable files rather than transient conversation snippets.
Query / Update Loop
When a user asks a question, the system searches the wiki and, if needed, falls back to raw sources. Selected wiki pages, original fragments, and log entries are packed into the LLM’s context window for reasoning and answer generation. Valuable answers can be saved back to the wiki as reviewed updates, not as immutable truth.
Comparison with RAG, Note‑Taking Apps, and Traditional Knowledge Bases
RAG : Retrieves raw fragments at query time; low entry cost; preserves original evidence but does not create structured, reusable understanding.
Note‑taking software : Manual organisation; highly controllable and accurate; high maintenance cost and slow updates for large, rapidly changing data.
Traditional knowledge bases : Stable, governed documentation; good for final artefacts but ill‑suited for evolving hypotheses and intermediate analysis.
LLM Wiki : LLM‑assisted Markdown wiki; readable, editable, reusable; risks include summary drift, frozen errors, and nondeterminism.
The optimal architecture often combines RAG (for factual grounding) with LLM Wiki (for preserving processed understanding).
Risks and Suitable Scenarios
Information loss : Summaries may omit details, limits, or exceptions, leading to oversimplified facts if the wiki is consulted without returning to raw sources.
Summary drift : Repeated edits across multiple imports can gradually shift emphasis or wording away from the original material.
Frozen errors : Incorrect summaries or ambiguous links become part of the context for future queries, propagating mistakes.
Nondeterminism : Different model states, prompts, or temperatures can produce varying page structures, so the wiki should not be treated as a deterministic compiler.
Suitable for personal research organisation, long‑term project knowledge bases, code‑base comprehension, team documentation, technical writing assets, and AI agents’ long‑term memory—any scenario where material is repeatedly read, compared, and updated. In high‑stakes domains (legal, medical, finance, compliance) the wiki must be used cautiously: treat pages as entry points, not definitive evidence; treat model‑generated updates as suggestions pending human review.
Implementation Checklist
Retain a raw/ directory so original sources are never replaced by the wiki.
Create wiki/index.md and wiki/log.md to provide navigation and change history.
Include source links on every important page for easy back‑reference.
Require manual review for critical concept pages before accepting model‑generated content.
Add lint rules to detect broken links, orphaned pages, missing sources, and outdated statements.
References: [1] llm-wiki.md: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f [2] nashsu/llm_wiki: https://github.com/nashsu/llm_wiki
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
