How to Build a Personal Knowledge Base with My Custom web‑pack Skill

This article explains how to construct a personal knowledge base using the author’s open‑source web‑pack Skill, which automates raw material collection, image localization, link expansion, and structured output, addressing the limitations of Obsidian’s Web Clipper and aligning with Karpathy’s LLM Wiki three‑layer architecture.

Old Zhang's AI Learning
Old Zhang's AI Learning
Old Zhang's AI Learning
How to Build a Personal Knowledge Base with My Custom web‑pack Skill

Karpathy’s LLM Wiki architecture

Three layers compose the personal knowledge base:

Raw – original articles, papers, images; the LLM only reads.

Wiki – LLM‑generated structured markdown with cross‑references.

Schema – rules that tell the LLM how to organise the Wiki.

The core idea is a “compiled knowledge base” that continuously integrates new material.

Raw‑layer bottleneck

Saving high‑quality web content locally is difficult. Using the Obsidian Web Clipper plugin reveals three hard limits:

Information loss : only the page text is saved; embedded links to papers, GitHub repos, or other resources remain as plain hyperlinks and are not expanded.

Image handling : images are kept as external links, which can break because of anti‑hotlinking, CDN changes, or authentication requirements.

No structure : each saved page becomes an isolated .md file without indexes, relationships, or source tracking.

web‑pack Skill

A custom Agent Skill named web-pack collects an entire thematic material pack instead of a single page.

Core design workflow

Read the main article’s body for each entry link.

Identify and expand related links (papers, repositories, official docs) found in the body.

Download every image to a local assets/ folder and replace links with relative paths.

Smartly filter out noise such as sidebars, ads, footers, and navigation menus.

Generate structured outputs: research brief, link inventory, image inventory, reading map, main entry markdown, and linked‑content markdown.

Output folder structure

YYYY-MM-DD-TopicName/
├── README.md               # Overview of the material pack
├── 00-research-brief.md    # Research brief
├── 01-link-inventory.md    # Full list of links
├── 02-image-inventory.md   # Image list
├── 03-reading-map.md      # Relationship graph
├── MAIN-01-Entry.md        # Entry page content
├── LINKED-02-Related.md   # Expanded related links
└── assets/                # Local image resources

Capture strategy: multi‑layer fallback

Standard HTTP fetch for the main content (preferred).

If the link points to a GitHub repo, use the GitHub API or raw README for optimized retrieval.

If the resource is markdown, JSON, or plain text, save it directly.

If all above fail, fall back to jina Reader as a last resort, avoiding over‑use.

Intelligent link filtering

Prioritized for expansion:

Official docs, blogs, papers, GitHub repos/README, benchmarks, data tables, example code, and any source that supports the core argument.

Skipped:

Navigation menus, footers, ads, recommendation sections, login/registration prompts, privacy policies, social‑share links, logos, favicons, and decorative images.

Image handling: full localisation

All body images are downloaded into assets/ and referenced via relative paths, eliminating broken external links, hotlink protection issues, and CDN changes. The Skill checks for any remaining non‑local images and automatically uploads them to a personal image‑hosting service.

Invocation parameters

--max-depth 1

: Collect the entry page and its directly linked resources (default). --max-depth 2: Deep‑dig for more thorough extraction. --max-pages 80: Limit total pages to avoid infinite expansion. --same-domain-only: Restrict collection to the same domain.

Comparison with Obsidian Web Clipper

Collection depth : Web Clipper captures a single page; web‑pack recursively expands entry + related links.

Image handling : Web Clipper keeps external links; web‑pack downloads all images locally.

Structure : Web Clipper provides no structure; web‑pack outputs research brief, link inventory, and reading map.

Noise filtering : Web Clipper offers limited filtering; web‑pack intelligently removes ads, navigation, and footers.

Failure fallback : Web Clipper has none; web‑pack falls back through HTTP → GitHub API → Jina Reader.

Suitable scenarios : Web Clipper is ideal for quick clipping of a single article; web‑pack excels at deep collection of a complete thematic material pack.

Integration with the LLM Wiki workflow

Run web-pack to gather a clean Raw material pack.

Organise and browse the pack in Obsidian.

Let an LLM compile the material into a structured Wiki.

Perform regular health checks to keep the Wiki consistent and up‑to‑date.

web‑pack vs Web Clipper comparison diagram
web‑pack vs Web Clipper comparison diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AutomationAI agentsLLMKnowledge Managementweb-scrapingObsidian
Old Zhang's AI Learning
Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.