How to Build a Personal Knowledge Base with My Custom web‑pack Skill
This article explains how to construct a personal knowledge base using the author’s open‑source web‑pack Skill, which automates raw material collection, image localization, link expansion, and structured output, addressing the limitations of Obsidian’s Web Clipper and aligning with Karpathy’s LLM Wiki three‑layer architecture.
Karpathy’s LLM Wiki architecture
Three layers compose the personal knowledge base:
Raw – original articles, papers, images; the LLM only reads.
Wiki – LLM‑generated structured markdown with cross‑references.
Schema – rules that tell the LLM how to organise the Wiki.
The core idea is a “compiled knowledge base” that continuously integrates new material.
Raw‑layer bottleneck
Saving high‑quality web content locally is difficult. Using the Obsidian Web Clipper plugin reveals three hard limits:
Information loss : only the page text is saved; embedded links to papers, GitHub repos, or other resources remain as plain hyperlinks and are not expanded.
Image handling : images are kept as external links, which can break because of anti‑hotlinking, CDN changes, or authentication requirements.
No structure : each saved page becomes an isolated .md file without indexes, relationships, or source tracking.
web‑pack Skill
A custom Agent Skill named web-pack collects an entire thematic material pack instead of a single page.
Core design workflow
Read the main article’s body for each entry link.
Identify and expand related links (papers, repositories, official docs) found in the body.
Download every image to a local assets/ folder and replace links with relative paths.
Smartly filter out noise such as sidebars, ads, footers, and navigation menus.
Generate structured outputs: research brief, link inventory, image inventory, reading map, main entry markdown, and linked‑content markdown.
Output folder structure
YYYY-MM-DD-TopicName/
├── README.md # Overview of the material pack
├── 00-research-brief.md # Research brief
├── 01-link-inventory.md # Full list of links
├── 02-image-inventory.md # Image list
├── 03-reading-map.md # Relationship graph
├── MAIN-01-Entry.md # Entry page content
├── LINKED-02-Related.md # Expanded related links
└── assets/ # Local image resourcesCapture strategy: multi‑layer fallback
Standard HTTP fetch for the main content (preferred).
If the link points to a GitHub repo, use the GitHub API or raw README for optimized retrieval.
If the resource is markdown, JSON, or plain text, save it directly.
If all above fail, fall back to jina Reader as a last resort, avoiding over‑use.
Intelligent link filtering
Prioritized for expansion:
Official docs, blogs, papers, GitHub repos/README, benchmarks, data tables, example code, and any source that supports the core argument.
Skipped:
Navigation menus, footers, ads, recommendation sections, login/registration prompts, privacy policies, social‑share links, logos, favicons, and decorative images.
Image handling: full localisation
All body images are downloaded into assets/ and referenced via relative paths, eliminating broken external links, hotlink protection issues, and CDN changes. The Skill checks for any remaining non‑local images and automatically uploads them to a personal image‑hosting service.
Invocation parameters
--max-depth 1: Collect the entry page and its directly linked resources (default). --max-depth 2: Deep‑dig for more thorough extraction. --max-pages 80: Limit total pages to avoid infinite expansion. --same-domain-only: Restrict collection to the same domain.
Comparison with Obsidian Web Clipper
Collection depth : Web Clipper captures a single page; web‑pack recursively expands entry + related links.
Image handling : Web Clipper keeps external links; web‑pack downloads all images locally.
Structure : Web Clipper provides no structure; web‑pack outputs research brief, link inventory, and reading map.
Noise filtering : Web Clipper offers limited filtering; web‑pack intelligently removes ads, navigation, and footers.
Failure fallback : Web Clipper has none; web‑pack falls back through HTTP → GitHub API → Jina Reader.
Suitable scenarios : Web Clipper is ideal for quick clipping of a single article; web‑pack excels at deep collection of a complete thematic material pack.
Integration with the LLM Wiki workflow
Run web-pack to gather a clean Raw material pack.
Organise and browse the pack in Obsidian.
Let an LLM compile the material into a structured Wiki.
Perform regular health checks to keep the Wiki consistent and up‑to‑date.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
