Optimizing Social Media Feeds: Push vs Pull and Time‑Partitioned Pull Strategy

This article examines the push and pull models used by micro‑blogging platforms such as Twitter and Sina Weibo, analyzes their scalability challenges, and proposes a time‑partitioned pull approach that reduces database load while maintaining fast feed retrieval for active users.

21CTO
21CTO
21CTO
Optimizing Social Media Feeds: Push vs Pull and Time‑Partitioned Pull Strategy

Social networking services such as Twitter, Sina Weibo, and Renren use a feed system where each post (a “feed”) must be delivered to followers. This article discusses the traditional push and pull architectures and introduces a time‑partitioned pull model.

Weibo feed example
Weibo feed example

In the push model, when a user posts a micro‑blog, the system writes a copy of that post into the feed tables of all followers. For a celebrity with millions of followers, this creates millions of rows per post, leading to huge storage and write‑amplification.

Overall feed architecture
Overall feed architecture

The pull model stores each new post only once in a central feed table. When a user requests their timeline, the system queries the feed table for the IDs of posts from the users they follow, often using a cache such as Memcached. While this reduces write load, the feed table can become a bottleneck under heavy read traffic, especially for users with many followees.

Pull mode diagram
Pull mode diagram

To improve the pull model, the article proposes a time‑partitioned pull strategy. The feed table is divided into partitions based on time intervals (e.g., the last hour, the last day, longer periods). When a user logs in, the system first checks the most recent partition; subsequent requests only need to query the partition that matches the last known timeline, dramatically reducing the amount of data scanned.

Time‑partitioned pull improvement
Time‑partitioned pull improvement

This approach leverages the observation that active users frequently access recent data, so most queries hit small, recent partitions, while infrequent users may need to scan larger, older partitions only occasionally. The partitioning scheme can be tuned based on data volume and access patterns.

The time‑partitioned pull model can be combined with push techniques for certain scenarios, offering a cost‑effective solution that balances write and read performance.

Original article: http://www.cnblogs.com/sunli/archive/2010/08/24/twitter_feeds_push_pull.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Scalabilityfeed architecturepush-pullmicrobloggingtime partition
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.