Designing a Scalable Feed Stream System for Billions of Users
This article explains how to design a high‑performance feed‑stream architecture—including product definition, data modeling, storage choices, synchronization modes, metadata handling, commenting, likes, sorting, search, and deletion—so that a system can support tens of millions to billions of users while remaining reliable and scalable.
Introduction
About ten years ago, the rise of smartphones turned the Internet into a mobile era, giving birth to feed‑stream products such as Weibo, WeChat Moments, Toutiao, and Kuaishou. These applications present continuously updated content units (feeds) that flow from top to bottom, making them ideal for mobile browsing.
Feed Stream System Characteristics
A feed stream is essentially a data flow that delivers N publishers' content units to M receivers through follow relationships.
Data Model
The system handles three core data types:
Publisher data – the original posts or media generated by users.
Follow relationships – either one‑way (e.g., Weibo) or two‑way (e.g., WeChat friends).
Receiver data – the aggregated timeline for each user, usually ordered by recency.
These map to three storage concepts:
Repository: permanent storage of publisher data.
Follow table: permanent storage of relationship data.
Sync store: short‑term storage of receiver‑side, time‑sorted data.
Product Definition
Typical feed products fall into four categories: Weibo‑style, Moments‑style, short‑video (Douyin/TikTok) style, and private‑message style. The choice influences follow relationship type (single vs. double) and sorting (time vs. recommendation).
Storage Selection
For reliable, horizontally scalable storage, distributed NoSQL (e.g., Alibaba Cloud Tablestore, Bigtable) is preferred for large‑scale systems; MySQL can be used for small prototypes. The repository must guarantee durability and support linear scaling.
Synchronization Modes
Push mode (write‑fan‑out) : the publisher’s message is immediately pushed to all followers’ sync stores; requires extremely high write throughput.
Pull mode (read‑fan‑out) : followers read from publishers’ outboxes on demand; demands strong read capacity and per‑follower offset tracking.
Push‑pull hybrid : most users use push, while “big V” users use pull to avoid wasteful pushes to inactive followers.
Never rely solely on pull mode for large systems.
Metadata Services
Additional metadata includes user profiles, follow/friend lists, and a push‑session pool that tracks online users to avoid query storms caused by periodic client polling.
Comments and Likes
Both are stored similarly to feed content, with an extra reference to the parent message. Distributed NoSQL is suitable; relational databases can be used for smaller deployments.
Sorting
Two common sorting strategies are time‑based (used by Weibo, Moments, private messages) and score‑based (used by recommendation‑driven feeds). This article focuses on time‑based sorting.
Deletion and Update
Deletion can be physical (removing the record from the repository) or logical (marking it as deleted). Updates follow the same path; versioned stores like Tablestore can keep edit histories.
Search
Simple keyword search for users, posts, or friends can be implemented with a search engine or a full‑text capable database. Multi‑field indexes are added to the repository and user tables as needed.
System Architecture Overview
The complete architecture combines the core feed pipeline with metadata services, comment/like stores, search, and sorting modules. Two implementation paths are presented:
Open‑source stack (MySQL, Redis, HBase, etc.) for teams comfortable with operating multiple components.
Single‑system solution using Alibaba Cloud Tablestore, which provides built‑in support for all required features and automatic horizontal scaling.
Practical Scenarios
Specific feed types—Moments, Weibo, Toutiao, and private messages—are briefly described, each with its own relationship model and scaling considerations. Future articles will dive deeper into each variant.
Conclusion
The article outlines the essential building blocks for designing a billion‑user feed stream system, emphasizing product definition, storage, synchronization, metadata, interaction features, sorting, and search, and offers guidance on choosing between open‑source composites and a managed NoSQL service.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
