Designing a Scalable Weibo Feed: Push vs Pull Strategies and Storage Schemes

This article examines the architecture of a Weibo-like feed system, detailing storage design, read‑heavy scenarios, push (write‑diffusion) and pull (read‑diffusion) models, their trade‑offs, and a hybrid approach for handling both small and massive follower counts.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
Designing a Scalable Weibo Feed: Push vs Pull Strategies and Storage Schemes

Background

Weibo, WeChat Moments, Douyin and similar products are typical feed‑flow applications where users consume content generated by others.

How to Design a Weibo Feed

1. Storage Design

1) Feed Storage

Stores user‑generated posts permanently so that a user can always view their own timeline.

create table `t_feed`(
  `feedId` bigint not null PRIMARY KEY,
  `userId` bigint not null COMMENT 'creator ID',
  `content` text,
  `recordStatus` tinyint not null default 0 comment 'record status'
)ENGINE=InnoDB;

2) Follow Relationship Storage

Records the follower‑followee relationships, controlling the visibility range of feeds.

CREATE TABLE `t_like`(
    `id` int(11) NOT NULL PRIMARY KEY,
    `userId` int(11) NOT NULL,
    `likerId` int(11) NOT NULL,
    KEY `userId` (`userId`),
    KEY `likerId` (`likerId`)
)ENGINE=InnoDB;

3) Feed Inbox Storage

Acts as a per‑user inbox where feeds from followed users are pushed for quick retrieval.

create table `t_inbox`(
  `id` bigint not null PRIMARY KEY,
  `userId` bigint not null comment 'receiver ID',
  `feedId` bigint not null comment 'content ID',
  `createTime` datetime not null
)ENGINE=InnoDB;

2. Scenario Characteristics

Read‑heavy, write‑light workload.

Ordered display based on timeline or scoring.

3. Push Mode (Write Diffusion)

When a followed user publishes a post, the system pushes the post into every follower’s inbox.

Procedure

/** Insert a feed **/
insert into t_feed (`feedId`,`userId`,`content`,`createTime`) values (10001,4,'content','2021-10-31 17:00:00');
/** Query all fans **/
select userId from t_like where liker = 4;
/** Push feed into fans' inboxes **/
insert into t_inbox (`userId`,`feedId`,`createTime`) values (1,10001,'2021-10-31 17:00:00');
insert into t_inbox (`userId`,`feedId`,`createTime`) values (2,10001,'2021-10-31 17:00:00');
insert into t_inbox (`userId`,`feedId`,`createTime`) values (3,10001,'2021-10-31 17:00:00');

When a user reads their feed, the system simply selects all entries from t_inbox for that user and sorts them.

select feedId from t_inbox where userId = 1;

Problems

Latency: Inserting into many inboxes for a popular user can be slow.

Storage cost: Replicating a post for every follower leads to exponential growth.

Data‑state sync: Deleting a post or unfollowing requires cleaning up many inbox rows.

Possible Solutions

Push tasks to a message queue and consume them in parallel.

Use high‑performance, highly compressible databases.

Separate hot and cold data: keep recent data in a hot store, archive older data.

During read, filter out deleted or unfollowed posts.

Summary of Push Mode

Push mode works well when the follower count is modest (e.g., WeChat Moments), providing low read latency and simple retrieval.

4. Pull Mode (Read Diffusion)

Instead of writing to each follower’s inbox, the system reads feeds on demand.

Procedure

Fetch the list of followed user IDs.

Pull posts from those users.

Merge and sort the results by timeline.

Problems

When a user follows many accounts, pulling all posts and sorting can cause high latency and heavy read pressure on the database.

Mitigation via Caching

Cache the follow list per user.

Cache each author’s recent posts in a distributed cache (sharded).

During feed retrieval, fetch cached posts for all followed IDs from multiple cache shards, then merge and sort.

Horizontal scaling of cache nodes (e.g., three masters with replicas) distributes read load.

Summary of Pull Mode

Pull mode eliminates write‑diffusion latency and storage explosion, but introduces significant read load that must be mitigated with caching and sharding.

5. Overall Summary and Hybrid Approach

Combine both models by defining a follower‑count threshold. Users below the threshold use push mode; users above it switch to pull mode with cached content. An active‑fan list can still receive push updates for real‑time experience.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

scalable architecturedatabase schemapush modelpull modelfeed design
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.