Designing a Scalable Weibo Feed: Push vs Pull Strategies and Storage Schemes
This article examines the architecture of a Weibo-like feed system, detailing storage design, read‑heavy scenarios, push (write‑diffusion) and pull (read‑diffusion) models, their trade‑offs, and a hybrid approach for handling both small and massive follower counts.
Background
Weibo, WeChat Moments, Douyin and similar products are typical feed‑flow applications where users consume content generated by others.
How to Design a Weibo Feed
1. Storage Design
1) Feed Storage
Stores user‑generated posts permanently so that a user can always view their own timeline.
create table `t_feed`(
`feedId` bigint not null PRIMARY KEY,
`userId` bigint not null COMMENT 'creator ID',
`content` text,
`recordStatus` tinyint not null default 0 comment 'record status'
)ENGINE=InnoDB;2) Follow Relationship Storage
Records the follower‑followee relationships, controlling the visibility range of feeds.
CREATE TABLE `t_like`(
`id` int(11) NOT NULL PRIMARY KEY,
`userId` int(11) NOT NULL,
`likerId` int(11) NOT NULL,
KEY `userId` (`userId`),
KEY `likerId` (`likerId`)
)ENGINE=InnoDB;3) Feed Inbox Storage
Acts as a per‑user inbox where feeds from followed users are pushed for quick retrieval.
create table `t_inbox`(
`id` bigint not null PRIMARY KEY,
`userId` bigint not null comment 'receiver ID',
`feedId` bigint not null comment 'content ID',
`createTime` datetime not null
)ENGINE=InnoDB;2. Scenario Characteristics
Read‑heavy, write‑light workload.
Ordered display based on timeline or scoring.
3. Push Mode (Write Diffusion)
When a followed user publishes a post, the system pushes the post into every follower’s inbox.
Procedure
/** Insert a feed **/
insert into t_feed (`feedId`,`userId`,`content`,`createTime`) values (10001,4,'content','2021-10-31 17:00:00');
/** Query all fans **/
select userId from t_like where liker = 4;
/** Push feed into fans' inboxes **/
insert into t_inbox (`userId`,`feedId`,`createTime`) values (1,10001,'2021-10-31 17:00:00');
insert into t_inbox (`userId`,`feedId`,`createTime`) values (2,10001,'2021-10-31 17:00:00');
insert into t_inbox (`userId`,`feedId`,`createTime`) values (3,10001,'2021-10-31 17:00:00');When a user reads their feed, the system simply selects all entries from t_inbox for that user and sorts them.
select feedId from t_inbox where userId = 1;Problems
Latency: Inserting into many inboxes for a popular user can be slow.
Storage cost: Replicating a post for every follower leads to exponential growth.
Data‑state sync: Deleting a post or unfollowing requires cleaning up many inbox rows.
Possible Solutions
Push tasks to a message queue and consume them in parallel.
Use high‑performance, highly compressible databases.
Separate hot and cold data: keep recent data in a hot store, archive older data.
During read, filter out deleted or unfollowed posts.
Summary of Push Mode
Push mode works well when the follower count is modest (e.g., WeChat Moments), providing low read latency and simple retrieval.
4. Pull Mode (Read Diffusion)
Instead of writing to each follower’s inbox, the system reads feeds on demand.
Procedure
Fetch the list of followed user IDs.
Pull posts from those users.
Merge and sort the results by timeline.
Problems
When a user follows many accounts, pulling all posts and sorting can cause high latency and heavy read pressure on the database.
Mitigation via Caching
Cache the follow list per user.
Cache each author’s recent posts in a distributed cache (sharded).
During feed retrieval, fetch cached posts for all followed IDs from multiple cache shards, then merge and sort.
Horizontal scaling of cache nodes (e.g., three masters with replicas) distributes read load.
Summary of Pull Mode
Pull mode eliminates write‑diffusion latency and storage explosion, but introduces significant read load that must be mitigated with caching and sharding.
5. Overall Summary and Hybrid Approach
Combine both models by defining a follower‑count threshold. Users below the threshold use push mode; users above it switch to pull mode with cached content. An active‑fan list can still receive push updates for real‑time experience.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
