Designing a High‑Availability, Scalable Feed Stream System
This article introduces feed streams, explains their evolution from RSS to modern social feeds, classifies them by aggregation logic and display, discusses challenges such as real‑time performance and massive data, and presents a backend architecture with data models, pagination, write/read diffusion, and core publishing/reading workflows.
A feed stream is a continuously updated information flow that aggregates multiple subscribed sources—such as friends’ posts, recommended videos, or followed blogs—into a personalized list, enabling users to scroll and receive relevant content without manually searching.
The need for feed streams arises from the limitations of traditional media (TV, newspapers, magazines) where users must actively seek information; feed streams provide passive, personalized aggregation, improving user experience through relevance and convenience.
Feed streams can be classified in two dimensions: (1) by source‑aggregation logic—no dependency (e.g., TikTok recommendations), single‑direction dependency (following on Weibo), and bidirectional dependency (friend relationships on WeChat Moments); and (2) by display logic—interest‑based weighted ranking versus time‑ordered relationship feeds.
Historically, feed streams evolved from RSS aggregators, which combined user‑subscribed URLs, to social‑network News Feeds (e.g., Facebook) that aggregate content from people rather than sites, dramatically increasing richness and social relevance.
Key terminology includes: Feed (a single status or message), Feed Stream (the overall data flow), Timeline (time‑ordered feed), Inbox (the receiver’s stored Feed IDs), and diffusion methods such as write diffusion (push to followers) and read diffusion (pull on demand).
Challenges for a time‑ordered, relationship‑based feed include strict real‑time performance, massive message volume, a read‑heavy/write‑light workload, and the need for eventual consistency without message loss.
Essential functionalities for a Feed system are: publishing messages, deleting messages, viewing a user’s own posts, subscribing/unsubscribing to sources, reading the aggregated feed with pagination, and optional blacklist/whitelist controls and comment management.
Message propagation can use three strategies: read diffusion (pull from each followee’s outbox), write diffusion (push to each follower’s inbox), and a hybrid approach that applies write diffusion for normal users and read diffusion for high‑traffic “big V” users. Soft‑delete (marking a message as deleted) combined with lazy delete (removing the ID from the inbox only when accessed) solves consistency and pagination issues.
Pagination avoids traditional page numbers; instead, the client supplies the last_id of the previously retrieved item, and the server returns the next page_size items after that ID, ensuring stable ordering even as new items appear.
The proposed architecture consists of a message queue, a fan‑out service that writes the message to the author’s own timeline, and a Redis ZSET‑based inbox for each follower (keyed by uid+channel , score = timestamp). Persistent storage includes a message table, inbox table, publish‑configuration table, and follow‑relationship table, each with fields for IDs, timestamps, status, and extensible JSON blobs.
Publishing flow: a user’s new Feed enters the queue, the system retrieves the author’s follower list (distinguishing big V users), writes the Feed to the author’s timeline, and pushes the Feed ID to each follower’s inbox (or uses read diffusion for cold followers). Reading flow: the client checks if the user is active; if not, it pulls recent posts from followed big V users, reads the user’s inbox from the last known ID, merges and sorts results by time, and returns the combined list.
In summary, the article provides a comprehensive guide to building a reliable, high‑performance Feed stream system, covering conceptual foundations, classification, challenges, data modeling, pagination, diffusion strategies, and end‑to‑end publishing and consumption processes.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.