Turn LLMs into Knowledge Engineers: Build a Self‑Growing Obsidian Wiki

This article explains how Andrej Karpathy's LLM‑plus‑Obsidian workflow transforms large language models into continuous knowledge engineers, detailing a three‑layer architecture, core operations, practical setup steps, and open‑source tools that enable a self‑maintaining, compounding personal wiki.

AI Architecture Hub
AI Architecture Hub
AI Architecture Hub
Turn LLMs into Knowledge Engineers: Build a Self‑Growing Obsidian Wiki

Background and Motivation

Many users collect hundreds of technical articles, papers, and other resources but struggle to reuse the knowledge; retrieval‑augmented generation (RAG) tools require re‑deriving answers each time, causing knowledge to remain scattered and unstructured.

Karpathy’s LLM+Obsidian Approach

Andrej Karpathy shared a personal knowledge‑base management method that combines large language models (LLMs), autonomous agents, and Obsidian. The LLM acts as a knowledge engineer that continuously maintains a structured Markdown wiki. The approach is documented in a public Gist that has attracted thousands of stars.

Gist URL: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

Core Concept: LLM as Knowledge Engineer

Instead of a shallow "retrieve‑and‑answer" loop, users provide input (documents, valuable questions); the LLM handles all bookkeeping: summarizing core information, creating cross‑references, categorizing content, and ensuring consistency. Obsidian serves as the visual front‑end that displays the evolving wiki, allowing knowledge to compound like a rolling snowball.

Problems with Traditional RAG

Typical RAG pipelines (NotebookLM, ChatGPT file upload, most open‑source RAG systems) follow an "upload‑retrieve‑generate" pattern. Each query reprocesses the original documents, preventing any accumulation of knowledge and leading to duplicated effort.

Three‑Layer Architecture

3.1 Raw Sources – Immutable original files (papers, articles, images, data). The LLM only reads these files, preserving their integrity.

3.2 The Wiki – A structured collection of Markdown files generated and maintained entirely by the LLM. It contains summary pages, entity pages, concept analyses, comparative analyses, and thematic overviews. Users read and ask questions; the LLM edits and expands the wiki.

3.3 The Schema – Rule files that guide the LLM on organization, ingestion workflow, and query response conventions. Different agents can use different schema files (e.g., CLAUDE.md for Claude Code, AGENTS.md for Codex), and the schema can be iteratively refined.

Core Operations

Ingest – Add new material to the Raw Sources layer. The LLM reads the material, extracts key points, generates a summary page, updates the index, and propagates changes to related entity and concept pages (often affecting 10‑15 pages per new source).

Query – Ask the wiki directly. The LLM retrieves relevant pages, synthesizes a precise answer, and can store high‑quality answers as new wiki pages, enriching the knowledge base.

Lint – Periodic health‑check of the wiki. The LLM scans for contradictions, outdated information, orphan pages, missing cross‑references, and suggests additions or corrections, keeping the wiki consistent and up‑to‑date.

Practical Setup

Only two tools are required: an LLM agent (Claude Code, Codex, OpenCode, etc.) and Obsidian (free, open‑source). Steps:

Download and install Obsidian; create a new vault for the wiki and raw sources.

Copy Karpathy’s Gist content and feed it to the agent; the agent configures the wiki structure according to the schema.

Collect raw materials (papers, articles) and place them in the Raw Sources folder; run the Ingest operation.

During daily use, query the wiki, store valuable answers back into the wiki, and run Lint periodically.

Open‑Source Implementations

Community projects inspired by Karpathy’s idea include:

sage‑wiki – a Go binary that supports incremental compilation, fast search, intelligent Q&A, and can expose an MCP server for any LLM agent.

Claude Code Skill – a one‑command plugin for Claude Code that enables direct ingestion without extra configuration.

Thinking‑Space – an IDE designed for the LLM+Wiki workflow, offering enhanced editing, retrieval, and visualization.

Why the Approach Works

Traditional personal wikis fail because maintaining cross‑references, updating summaries, and resolving contradictions is labor‑intensive. The LLM never tires or forgets; it can modify dozens of files and cross‑references in a single pass, reducing maintenance cost to near zero and enabling true knowledge compounding.

Optional Enhancements

When the wiki grows to hundreds of sources, additional tools can improve the experience:

qmd search engine – boosts retrieval efficiency for large knowledge bases.

Marp – converts Markdown pages into slide decks for presentations.

Dataview plugin – enables dynamic queries over page metadata.

Git version control – tracks changes, creates branches, and facilitates collaboration.

Key Insight

The shift is from one‑off LLM Q&A to treating the LLM as a tireless knowledge engineer that continuously refines a structured, reusable knowledge network, creating a feedback loop of input‑processing‑output‑deposition that yields genuine knowledge “interest” growth.

Open‑source repo: https://github.com/xoai/sage-wiki
LLMKnowledge EngineeringWikiObsidianPersonal Knowledge Management
AI Architecture Hub
Written by

AI Architecture Hub

Focused on sharing high-quality AI content and practical implementation, helping people learn with fewer missteps and become stronger through AI.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.