How to Build a Private AI‑Powered RSS Reading Knowledge Base

The article details a fully automated workflow that fetches 92 top‑tech blogs via RSS, cleans the content into Markdown, uses a MiniMax‑M2.5 LLM to generate concise Chinese summaries, and delivers them through Bark and a Telegram bot, all stored for seamless integration with Obsidian.

Old Zhang's AI Learning
Old Zhang's AI Learning
Old Zhang's AI Learning
How to Build a Private AI‑Powered RSS Reading Knowledge Base

Overview: AI Meets High‑Quality RSS

The author creates a daily, hands‑free system that collects articles from 92 Hacker News‑curated tech blogs, extracts clean Markdown, summarizes them with an LLM, and pushes the digest to a phone.

Core Logic

A Python script runs as a launchd job on macOS, waking at 8 AM each morning.

Fetch : Parses an OPML file ( blog_feeds.opml) supplied by AK, extracts each xmlUrl, and concurrently downloads the latest articles with retry logic to tolerate occasional server failures.

Parse & Store : Strips ads and boilerplate, converts the main body to pure Markdown, and saves the files under articles/. The resulting summaries are placed in summaries/, ready for Obsidian.

AI High‑Dimensional Refinement : Sends each article’s title and cleaned text to SiliconFlow’s MiniMax‑M2.5 model using a carefully crafted prompt that requests a macro overview, a detailed introduction for every article, and a formatted link, all in Chinese. The prompt explicitly forbids one‑sentence answers and requires a professional yet lively tone.

Notification & Delivery : When the pipeline finishes, Bark sends an iOS push saying “Today’s RSS briefing is ready.” Simultaneously a private Telegram bot posts the full Markdown digest, including direct links, author names, and timestamps.

Detailed Operation

1. OPML Batch Parsing – The script reads blog_feeds.opml, extracts every xmlUrl, and uses parallel requests with fault‑tolerant retries so that a single failing site does not stop the whole queue.

2. LLM‑Driven Dimensionality Reduction – For each of the 92 articles, the author packages the title and body into the API request. The prompt asks the model to produce:

A macro summary of today’s main themes.

A **very detailed introduction** for each article, without personal opinions.

The article link immediately after its summary, using the exact markdown format **[Title](URL)** followed by “Source:” and “Introduction:”.

A lively, professional tone without unnecessary markdown separators.

The MiniMax‑M2.5 model returns a logically coherent Chinese briefing in seconds.

3. Bark Alert + Telegram Structured Push – Bark pops up a notification, and the Telegram bot delivers a nicely formatted Markdown file containing all article links, author names, and publish dates, enabling quick skim‑reading on a phone or during a commute.

4. Personal Knowledge Base in Obsidian – The raw articles are stored as Markdown in the local articles/ folder, while the daily digests reside in summaries/. This continuously grows a high‑quality, searchable knowledge vault that can later be used for local RAG (retrieval‑augmented generation) experiments.

Real‑World Experience

The author reports that on a typical day the system extracts about 28 new articles, each summarized with clear key points such as a veteran hacker’s decade‑long code retrospection, a high‑engagement post on front‑end engineering shifts in the LLM era, and insights on emerging framework features. Reading the digest takes roughly three minutes, dramatically reducing information overload.

Limitations

Some blogs employ anti‑scraping measures or malformed RSS feeds, causing occasional missed articles.

Scaling beyond a few hundred feeds raises concurrency demands and inflates LLM token costs.

For the current set of 92 curated blogs, the author finds an optimal balance between quality and cost.

Conclusion

Building a personal, high‑defense information filter using a curated RSS list and a strong LLM creates a “knowledge moat” that shields the reader from noisy, low‑value content. The automated pipeline turns raw web noise into a structured, daily briefing that feeds directly into an Obsidian vault, enabling deeper, self‑directed learning.

https://gist.github.com/emschwartz/e6d2bf860ccc367fe37ff953ba6de66b
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonAIAutomationLLMRSSBarkTelegramObsidian
Old Zhang's AI Learning
Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.