Design and Analysis of a Weibo Feed System: Storage, Push vs. Pull Models, and Scaling Strategies
This article examines the architectural design of a Weibo-like feed system, covering storage schema, the trade‑offs between push (write‑amplification) and pull (read‑amplification) delivery models, caching techniques, and a hybrid approach for handling both small‑scale and large‑scale follower scenarios.
Weibo, WeChat Moments, Douyin and similar applications are typical feed‑driven products where users consume content posted by others; this article analyses the design of a Weibo feed system and discusses possible improvements.
1. Storage design – The system is divided into three main tables:
Feed storage (permanent):
create table `t_feed`(
`feedId` bigint not null PRIMARY KEY,
`userId` bigint not null COMMENT '创建人ID',
`content` text,
`recordStatus` tinyint not null default 0 comment '记录状态'
)ENGINE=InnoDB;Follow relationship (permanent):
CREATE TABLE `t_like`(
`id` int(11) NOT NULL PRIMARY KEY,
`userId` int(11) NOT NULL,
`likerId` int(11) NOT NULL,
KEY `userId` (`userId`),
KEY `likerId` (`likerId`)
)ENGINE=InnoDB;Inbox (temporary, acts as a mailbox for each user):
create table `t_inbox`(
`id` bigint not null PRIMARY KEY,
`userId` bigint not null comment '收件人ID',
`feedId` bigint not null comment '内容ID',
`createTime` datetime not null
)ENGINE=InnoDB;2. Scenario characteristics – The workload is read‑heavy (many more reads than writes) and requires ordered display based on timestamps or ranking scores.
3. Push (write‑amplification) model – When a user posts a feed, the system pushes the content to all followers' inboxes:
/** 插入一条feed数据 **/
insert into t_feed (`feedId`,`userId`,`content`,`createTime`) values (10001,4,'内容','2021-10-31 17:00:00');
/** 查询所有粉丝 **/
select userId from t_like where liker = 4;
/** 将feed插入粉丝的收件箱中 **/
insert into t_inbox (`userId`,`feedId`,`createTime`) values (1,10001,'2021-10-31 17:00:00');
insert into t_inbox (`userId`,`feedId`,`createTime`) values (2,10001,'2021-10-31 17:00:00');
insert into t_inbox (`userId`,`feedId`,`createTime`) values (3,10001,'2021-10-31 17:00:00');Problems: poor real‑time performance for popular users, high storage cost due to data duplication, and difficulty keeping data state (deletions, unfollows) consistent.
Suggested mitigations include using a message queue for asynchronous insertion, employing high‑performance databases, and separating hot and cold data with periodic cleanup.
4. Pull (read‑amplification) model – Users retrieve feeds by pulling content from the sources they follow:
/* Get all followed user IDs */
select liker from t_like where userId = 1;
/* Pull feeds of those users */
select * from t_feed where userId in (4,5,6) and recordStatus = 0;After fetching, the results are merged and sorted by timeline. This eliminates write‑amplification but introduces heavy read pressure, especially when a user follows many accounts.
To reduce latency, a caching layer is introduced:
Cache the follow‑list per user (key: userId, value: set of followed IDs).
Cache each author’s recent feeds (key: authorId, value: feed list).
When a user requests a feed, retrieve the relevant cached entries from multiple shard nodes, merge, and sort.
Even with caching, read pressure can be high; horizontal scaling of cache shards (e.g., three‑master‑three‑slave) is recommended.
5. Hybrid approach – Define a follower‑count threshold. Below the threshold, use the push model (low duplication, good immediacy). Above the threshold, switch to the pull model with cached hot data, while still optionally pushing to a small “active‑fan” subset for real‑time updates.
In summary, push mode suits small‑fan‑count scenarios (e.g., personal moments), pull mode fits large‑scale users (e.g., Weibo celebrities), and a combined strategy can balance immediacy, storage cost, and performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
