Design and Implementation of a Weibo Feed Stream: Storage Architecture, Push vs Pull Models, and Performance Optimizations
This article examines the design of a Weibo‑style feed system, detailing storage tables, the trade‑offs between push (write‑fan‑out) and pull (read‑fan‑out) delivery models, their scalability challenges, and hybrid solutions using caching to achieve low latency and manageable storage costs.
Background: Feed‑based products like Weibo, WeChat Moments, and Douyin consist of user‑generated feeds.
This article analyzes the design of a Weibo‑style feed stream, covering storage design, scenario characteristics, push (write‑fan‑out) and pull (read‑fan‑out) models, their advantages, drawbacks, and possible optimizations.
1. Storage Design
Three tables are proposed:
create table `t_feed`( `feedId` bigint not null PRIMARY KEY, `userId` bigint not null COMMENT '创建人ID', `content` text, `recordStatus` tinyint not null default 0 comment '记录状态' )ENGINE=InnoDB; CREATE TABLE `t_like`( `id` int(11) NOT NULL PRIMARY KEY, `userId` int(11) NOT NULL, `likerId` int(11) NOT NULL, KEY `userId` (`userId`), KEY `likerId` (`likerId`) )ENGINE=InnoDB; create table `t_inbox`( `id` bigint not null PRIMARY KEY, `userId` bigint not null comment '收件人ID', `feedId` bigint not null comment '内容ID', `createTime` datetime not null )ENGINE=InnoDB;Scenario characteristics: read‑heavy, write‑light, ordered display.
2. Push (Write‑Fan‑Out) Model
When a user posts, the system inserts the feed into every follower’s inbox.
/** 插入一条feed数据 **/ insert into t_feed (`feedId`,`userId`,`content`,`createTime`) values (10001,4,'内容','2021-10-31 17:00:00'); /** 查询所有粉丝 **/ select userId from t_like where liker = 4; /** 将feed插入粉丝的收件箱中 **/ insert into t_inbox (`userId`,`feedId`,`createTime`) values (1,10001,'2021-10-31 17:00:00'); insert into t_inbox (`userId`,`feedId`,`createTime`) values (2,10001,'2021-10-31 17:00:00'); insert into t_inbox (`userId`,`feedId`,`createTime`) values (3,10001,'2021-10-31 17:00:00');Problems: poor real‑time delivery for high‑profile users, high storage cost, and synchronization issues when content is deleted or unfollowed.
3. Pull (Read‑Fan‑Out) Model
Followers retrieve feeds on demand by first fetching the list of followed user IDs and then querying their posts.
select liker from t_like where userId = 1; select * from t_feed where userId in (4,5,6) and recordStatus = 0;This reduces write amplification but introduces heavy read load and latency, which can be mitigated with caching layers (follow‑list cache, content cache) and sharding.
4. Summary
Push mode suits small‑scale fan‑out scenarios (e.g., WeChat Moments), while pull mode is better for large‑scale “big‑V” users; a hybrid approach with thresholds and selective caching can balance immediacy and storage efficiency.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.