Artificial Intelligence 6 min read

How Karpathy Builds a Personal Knowledge Base with LLMs: A Step‑by‑Step Blueprint

Karpathy outlines a detailed workflow for using large language models to automatically collect, organize, and continuously enrich personal research materials into an interlinked Markdown wiki, highlighting tools, architecture, and future directions for a self‑improving AI‑powered second brain.

PaperAgent

Apr 5, 2026

How Karpathy Builds a Personal Knowledge Base with LLMs: A Step‑by‑Step Blueprint

Karpathy recently shared two extensive posts describing how to use large language models (LLMs) to construct a personal knowledge base, or "second brain," and the best practices behind it.

Workflow Overview

Collect raw materials (articles, papers, code repositories, datasets, images) into a folder, then let the LLM automatically organize them into a wiki of Markdown files that include summaries, cross‑links, and thematic articles.

Use Obsidian as the front‑end interface; raw data, the generated wiki, and visualizations are all viewed in one place. The LLM writes and maintains the wiki while the user rarely edits directly.

When the wiki reaches a substantial size (e.g., ~100 papers, ~400 k words), the user can query the LLM directly without a separate Retrieval‑Augmented Generation (RAG) system; the LLM maintains its own index files and retrieves relevant content on demand.

The LLM not only returns plain text but also renders Markdown documents, slides, charts, and images, then archives the results back into the wiki, continuously enriching the knowledge base.

Periodic "health checks" enable the LLM to detect inconsistencies, fill gaps via web search, and suggest new connections, allowing the wiki to self‑clean and evolve over time.

Karpathy introduced a "vibe coding" search engine built on the wiki that can be used directly in the browser or handed to the LLM for solving larger problems.

Future work includes fine‑tuning a custom model on personal research data so that knowledge is stored not only in the context window but also baked into the model weights.

Core Architecture

Raw Materials – Immutable source documents (articles, papers, images, data files) that the LLM reads but never modifies.

Wiki – A directory of Markdown files generated by the LLM, containing abstracts, entity pages, concept pages, comparative analyses, overviews, and synthesized conclusions. The LLM creates, updates, and cross‑links pages; the user only reads.

Schema – A configuration document (e.g., CLAUDE.md or AGENTS.md) that defines the wiki’s structure, conventions, and workflows for ingestion, answering, and maintenance, turning the LLM into a disciplined wiki curator rather than a generic chatbot.

Swarm‑Enhanced Wiki

Community members have extended Karpathy’s pattern into a multi‑agent "swarm" architecture that hard‑wires the AI swarm’s most critical weakness, creating a self‑purifying, self‑iterating collective knowledge brain capable of long‑term stable operation.

Reference

https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

LLM Wiki Obsidian Personal Knowledge Base Swarm Architecture

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.