How to Build a Private AI‑Powered RSS Reading Knowledge Base
The article details a fully automated workflow that fetches 92 top‑tech blogs via RSS, cleans the content into Markdown, uses a MiniMax‑M2.5 LLM to generate concise Chinese summaries, and delivers them through Bark and a Telegram bot, all stored for seamless integration with Obsidian.
Overview: AI Meets High‑Quality RSS
The author creates a daily, hands‑free system that collects articles from 92 Hacker News‑curated tech blogs, extracts clean Markdown, summarizes them with an LLM, and pushes the digest to a phone.
Core Logic
A Python script runs as a launchd job on macOS, waking at 8 AM each morning.
Fetch : Parses an OPML file ( blog_feeds.opml) supplied by AK, extracts each xmlUrl, and concurrently downloads the latest articles with retry logic to tolerate occasional server failures.
Parse & Store : Strips ads and boilerplate, converts the main body to pure Markdown, and saves the files under articles/. The resulting summaries are placed in summaries/, ready for Obsidian.
AI High‑Dimensional Refinement : Sends each article’s title and cleaned text to SiliconFlow’s MiniMax‑M2.5 model using a carefully crafted prompt that requests a macro overview, a detailed introduction for every article, and a formatted link, all in Chinese. The prompt explicitly forbids one‑sentence answers and requires a professional yet lively tone.
Notification & Delivery : When the pipeline finishes, Bark sends an iOS push saying “Today’s RSS briefing is ready.” Simultaneously a private Telegram bot posts the full Markdown digest, including direct links, author names, and timestamps.
Detailed Operation
1. OPML Batch Parsing – The script reads blog_feeds.opml, extracts every xmlUrl, and uses parallel requests with fault‑tolerant retries so that a single failing site does not stop the whole queue.
2. LLM‑Driven Dimensionality Reduction – For each of the 92 articles, the author packages the title and body into the API request. The prompt asks the model to produce:
A macro summary of today’s main themes.
A **very detailed introduction** for each article, without personal opinions.
The article link immediately after its summary, using the exact markdown format **[Title](URL)** followed by “Source:” and “Introduction:”.
A lively, professional tone without unnecessary markdown separators.
The MiniMax‑M2.5 model returns a logically coherent Chinese briefing in seconds.
3. Bark Alert + Telegram Structured Push – Bark pops up a notification, and the Telegram bot delivers a nicely formatted Markdown file containing all article links, author names, and publish dates, enabling quick skim‑reading on a phone or during a commute.
4. Personal Knowledge Base in Obsidian – The raw articles are stored as Markdown in the local articles/ folder, while the daily digests reside in summaries/. This continuously grows a high‑quality, searchable knowledge vault that can later be used for local RAG (retrieval‑augmented generation) experiments.
Real‑World Experience
The author reports that on a typical day the system extracts about 28 new articles, each summarized with clear key points such as a veteran hacker’s decade‑long code retrospection, a high‑engagement post on front‑end engineering shifts in the LLM era, and insights on emerging framework features. Reading the digest takes roughly three minutes, dramatically reducing information overload.
Limitations
Some blogs employ anti‑scraping measures or malformed RSS feeds, causing occasional missed articles.
Scaling beyond a few hundred feeds raises concurrency demands and inflates LLM token costs.
For the current set of 92 curated blogs, the author finds an optimal balance between quality and cost.
Conclusion
Building a personal, high‑defense information filter using a curated RSS list and a strong LLM creates a “knowledge moat” that shields the reader from noisy, low‑value content. The automated pipeline turns raw web noise into a structured, daily briefing that feeds directly into an Obsidian vault, enabling deeper, self‑directed learning.
https://gist.github.com/emschwartz/e6d2bf860ccc367fe37ff953ba6de66b
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
