Artificial Intelligence 23 min read

Building Karpathy’s LLM Wiki with Obsidian: Three‑Layer Architecture and Three Core Operations

This tutorial explains how to implement Andrej Karpathy’s LLM Wiki method using Obsidian, detailing a three‑layer schema‑raw‑wiki architecture, the Ingest‑Query‑Lint workflow, automatic bookkeeping that drives knowledge accumulation, and practical setup steps for personal or team use.

Shuge Unlimited

Jul 3, 2026

Building Karpathy’s LLM Wiki with Obsidian: Three‑Layer Architecture and Three Core Operations

Why RAG Doesn’t Accumulate Knowledge

Typical Retrieval‑Augmented Generation (RAG) systems index uploaded files, retrieve a few chunks for each query, and then discard the results, so each interaction is a one‑off event with no persistent knowledge or contradiction tracking.

Karpathy’s LLM Wiki differs: when new material is added, the LLM reads the whole document, extracts key facts, updates existing wiki pages, creates or revises entity and summary pages, and flags contradictions. Each ingest touches 10‑15 wiki pages, producing a persistent, compounding artifact where cross‑references, contradictions, and summaries continuously evolve.

Why Maintenance Cost Approaches Zero

The tedious part of a knowledge base is bookkeeping—updating cross‑references, keeping summaries fresh, marking superseded statements, and maintaining consistency across dozens of pages. By externalizing this bookkeeping to the LLM, the user only sources material and asks questions, while the LLM handles all the “dirty work”.

Three‑Layer Architecture

Schema (Behavior Configuration) : A file (recommended AGENTS.md) that tells the LLM how to organize the wiki, what page types exist, and how to write cross‑references. It makes the LLM a disciplined maintainer rather than a generic chatbot.

Raw Sources (Read‑Only Layer) : The original articles, PDFs, images, and other assets you place in the raw/ folder. Only you write here; the LLM treats these files as the immutable source of truth.

Wiki (LLM‑Maintained Layer) : Markdown pages that the LLM creates and updates (summaries, entity pages, concept pages, comparison pages). You only read this layer.

Core Operations

Ingest (摄取)

Place new material in raw/.

Tell the LLM to “process this”.

The LLM reads the file, creates a summary page under wiki/来源/, updates related entity pages, creates or updates concept pages, checks for comparison opportunities, and refreshes index.md and log.md.

Karpathy recommends ingesting one document at a time to retain control.

Query (查询)

LLM reads index.md to locate relevant pages.

It follows wiki‑style links ( [[...]]) to gather context, optionally revisiting raw sources.

It generates an answer and cites source pages.

If the answer yields a new insight, analysis, or comparison, the LLM writes it as a new page in wiki/对比/ and appends an entry to log.md.

Lint (健康检查)

Detect contradictory statements across pages.

Find orphan pages without inbound links.

Identify missing target pages for existing links.

Spot outdated information that newer sources have superseded.

Check that index.md matches the actual file set.

Suggest new comparison pages when multiple sources disagree.

Report dead external URLs and unused images.

Issues that can be auto‑fixed (e.g., missing backlinks) are repaired directly by the LLM; issues requiring human judgment (e.g., contradictions) are only reported.

Page Types and Frontmatter

Every wiki page starts with YAML frontmatter containing fields such as type, tags, sources, created, and updated. The type determines how Dataview renders dynamic tables.

Source : One page per raw document, created automatically.

Entity : Concrete objects (people, companies, tools). Created the first time they are discussed in depth.

Concept : Abstract ideas. Created only after they appear in at least two sources; otherwise they go to a pending list.

Comparison : Pages that juxtapose differing viewpoints from multiple sources or capture valuable analysis generated during a query.

Rule of thumb: “Don’t abstract too early.” Concepts wait for repeated mentions before becoming full pages.

Cross‑Reference Syntax

- 引用实体页：[[实体/Karpathy]]
- 引用概念页：[[概念/LLM Wiki]]
- 引用原始资料：[[raw/素材/文章/2026/06/xxx.md]]
- 引用图片：![[raw/素材/图片/xxx.png]]

Images are never copied into the wiki; they remain in raw/ and are referenced directly to avoid duplication.

Index.md and Log.md

index.md serves as the wiki’s homepage. The LLM reads it first to locate relevant pages. For medium‑scale wikis (≈100 sources, a few hundred pages), a simple markdown index combined with the Dataview plugin is sufficient—no vector search needed.

log.md is an append‑only timeline that records each operation, e.g.:

## [2026-07-02] ingest | Processed Karpathy’s LLM Wiki gist
## [2026-07-02] query | Compared RAG vs LLM Wiki accumulation
## [2026-07-03] lint | Found 3 orphan pages, cleaned 2

LLM can read the recent log entries to understand the current state of the wiki.

Obsidian’s Role

Obsidian acts as the IDE: the LLM edits markdown files while Obsidian instantly displays updates, graph view, and dynamic tables.

Web Clipper : One‑click capture of web articles into raw/.

Image Download : Configure the attachment folder (e.g., raw/assets/) and use the “Download attachments” hotkey so the LLM can read local images.

Graph View : Visualize connections, hubs, and orphan pages.

Dataview Plugin : Generates tables from frontmatter, keeping index listings in sync automatically.

The wiki is just a Git repository of markdown files, giving you version history, branching, and collaboration for free.

Design Trade‑offs (What Is Not Done)

No image duplication—images stay in raw/.

No copying raw documents into the wiki; only extracted knowledge is stored.

No vector‑search infrastructure for medium‑scale wikis; a simple index suffices.

No dedicated tags directory; tags are stored in frontmatter and used by Dataview.

These choices keep the system simple and robust until the knowledge base grows beyond a few hundred pages.

Suitable Scenarios

Personal knowledge management (goals, health, learning logs).

Research projects that require a living literature review.

Reading‑by‑chapter note‑taking for books.

Team knowledge bases that ingest Slack discussions, meeting notes, and project docs.

Hands‑On Setup

Create an empty Obsidian vault (e.g., my-wiki).

Open a terminal, cd into the vault, and launch your preferred LLM agent (Claude Code, OpenCode, Codex, Cursor, etc.).

Feed Karpathy’s gist (https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) with a prompt such as:

Read this gist and, following the LLM Wiki method, initialize the full wiki structure. Use AGENTS.md as the schema file name (do not use CLAUDE.md ).

The LLM will create the directory tree ( raw/, wiki/来源/, wiki/实体/, wiki/概念/, wiki/对比/), generate AGENTS.md, index.md, and an empty log.md.

Optionally install the Dataview and Web Clipper plugins in Obsidian.

Final Thoughts

The core division remains: humans source material, LLMs do the bookkeeping. This mirrors Vannevar Bush’s 1945 vision of a personal knowledge store, with the LLM solving the maintenance problem. For domains that continuously grow and require ongoing integration, LLM Wiki offers a more sustainable alternative to pure RAG.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Agents LLM Prompt Engineering Git Knowledge Management Obsidian

Written by

Shuge Unlimited

Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.