Why Modern AI Systems Should Compile Knowledge Instead of Just Retrieving It
Traditional RAG pipelines forget everything after each query, but the LLM Wiki mode proposed by Andrej Karpathy compiles source material into a version‑controlled, cross‑referenced Markdown wiki, enabling knowledge to compound over time, reduce query costs, and provide a transparent, human‑readable knowledge base for AI engineers.
Karpathy’s Insight
Karpathy compares the problem to software compilation: instead of re‑executing raw source code for every request, we should compile knowledge once and reuse the compiled artifact. Traditional Retrieval‑Augmented Generation (RAG) reads the same documents, re‑chunks them, and re‑synthesizes answers on every query, leading to a stateless system that never learns.
LLM Wiki Mode
The LLM Wiki mode treats knowledge as a compiled Wiki. An LLM reads raw sources, synthesizes them into structured, inter‑linked Markdown files, and stores these files in a Git‑backed repository. The Wiki replaces the vector store; it is a set of human‑readable, LLM‑maintained Markdown pages, each representing a single concept.
Three‑Layer Architecture
Layer 1 – Raw Sources (immutable) : PDFs, web articles, YouTube transcripts, etc., stored under sources/. They are never modified and serve as the audit trail.
Layer 2 – The Wiki (LLM‑maintained) : A collection of *.md files, each with YAML front‑matter (title, tags, source list, timestamp) and cross‑references using [[slug]]. Special files include index.md, log.md, and .meta/embeddings.json.
Layer 3 – Schema (governance) : A JSON file defining the universe of pages (slug, title, description). Adding a PageSpec to the schema causes the next ingest to create or update the corresponding Wiki page.
Four Core Operations
Init
Bootstrap the directory structure ( wiki/, sources/, index, log, embeddings) from a schema template. After this, humans add sources and the rest runs automatically.
Ingest – Knowledge Compounding
The ingest pipeline consists of five ordered steps:
Parse source : Detect source type (local file, YouTube URL, HTTP page) and extract raw text.
Route : LLM reads the schema summary and source text, returns a JSON array of relevant slugs, limiting downstream work to only pertinent pages.
Synthesize : For each relevant slug, the LLM receives the existing page content and the new source, then rewrites the page while preserving prior information (the “keep and expand” invariant).
Embed : Updated pages are re‑embedded with OpenAI text-embedding-3-small (1536‑dimensional vectors) and the embedding index is updated in place.
Update index and log : index.md is regenerated to reflect new pages; log.md receives a timestamped entry describing which slugs were touched and which source triggered the update.
Prompt caching reduces repeated LLM calls by ~90% during ingest.
Query – RAG on the Compiled Wiki
Embed the user question with the same model used for the index.
Perform cosine similarity search on the embedding index to retrieve top‑k relevant Wiki pages.
Load the full bodies of those pages and assemble the context.
Stream the answer using Claude, feeding the assembled Wiki context as knowledge.
If the --save flag is set, the generated answer is archived as a new Wiki page whose slug is derived from the question, enabling the knowledge to compound.
Lint – Health Checks
The lint operation validates the Wiki structure: orphan pages, missing pages, broken [[slug]] links, stale embeddings, and missing source citations. With --deep, the LLM also checks for factual contradictions between pages. --fix forces a full re‑embedding of all pages.
Query Templates (Named Prompts)
Six template categories extend the Wiki beyond simple Q&A:
Synthesis queries : “Give me the single most important insight that ties everything together.”
Gap‑finding queries : “What important topics are completely missing from my Wiki?”
Debate queries : “What are the biggest disagreements between my sources?”
Output queries : Generate study guides, FAQs, slide outlines, etc.
Health queries : Audit consistency, completeness, and integrity.
Personal‑application queries : “Based on everything I know, what mistake am I most likely to make now?”
These turn the Wiki into a thinking partner rather than a pure retrieval engine.
Implementation Details (Python Package)
The open‑source llm‑wiki package is organized as follows: embeddings.py: Wrapper around OpenAI’s embedding API, returns normalized 1536‑dim vectors and handles batch upserts. index.py: Simple JSON‑backed EmbeddingIndex with a top_k(query_vector, k) method that performs a linear NumPy cosine‑similarity scan (fast for <≈500 pages). wiki.py: CRUD for Wiki pages using python‑frontmatter to read/write YAML front‑matter and enforce the schema. prompts.py: Central store of all system prompts (routing, synthesis, answer, contradiction‑check, QUERY_TEMPLATES). ingest.py: Coordinates the five‑step ingest pipeline and implements resolve_source() to abstract over local files, YouTube URLs, and HTTP pages. query.py: Implements the four‑step query flow and optionally calls the --save archival logic. lint.py: Performs structural and deep LLM‑based consistency checks. cli.py: Click‑based command‑line interface exposing init, ingest, query, lint, prompts, status, and ... commands.
The modular design means swapping the embedding model, adding a BM25 hybrid, or supporting new source types only requires changes in a single module.
Advantages
Knowledge compounds over time; each new source enriches existing pages rather than overwriting them.
Queries become cheaper and faster because heavy synthesis happens during ingest.
All artifacts are plain Markdown, version‑controlled with Git, and fully human‑readable.
Source provenance is explicit, enabling easy traceability and selective re‑ingest.
Source‑type agnostic ingestion (files, YouTube, web pages) via resolve_source().
Prompt caching and routing dramatically lower API costs.
Cross‑references ( [[slug]]) create an implicit knowledge graph without a separate graph database.
Limitations
Ingest is expensive and slow for large corpora because each relevant page triggers two Claude calls (routing + synthesis).
Quality depends on the LLM; hallucinations become permanent unless caught by lint --deep.
Designing a good schema is non‑trivial and requires iterative refinement.
Linear scan indexing scales only to a few hundred pages; larger wikis need FAISS, Chroma, or Weaviate.
Stale pages can persist if routing fails to recognize new relevance; regular linting and re‑ingest are required.
The model is not suited for real‑time streams (e.g., Twitter, live stock data).
The --save loop can generate noisy pages; users must manually decide which answers merit archiving.
Implications for Today’s AI Engineers
The LLM Wiki mode arrives as the field shifts from chat‑based LLMs to autonomous agents that need persistent memory. It offers a concrete “memory layer” that agents can read from and write to via MCP‑style tool calls ( wiki_search, wiki_ingest, wiki_lint, wiki_graph). This aligns with Anthropic’s Model‑Context‑Protocol (MCP) and demonstrates how a simple Markdown‑based knowledge base can serve as a scalable, versioned memory for agents.
By reframing RAG from “answer a question” to “maintain an ever‑growing knowledge representation”, the approach unlocks new applications: gap detection, contradiction surfacing, automated study‑guide generation, and cost‑effective long‑term knowledge retention.
Ultimately, the LLM Wiki pattern shows that a modest amount of engineering (Markdown files, a JSON schema, an embedding model) can yield a powerful, compound‑interest knowledge system without needing complex vector databases or heavyweight orchestration frameworks.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
