How Weibo Scales to Billions: Inside Its Multi‑Layer Cache Architecture

This article explains how Weibo handles massive daily traffic of over a hundred billion requests by employing a five‑layer feed system and a six‑layer cache architecture that evolved from simple KV storage to sophisticated counter and existence‑check services, highlighting design choices, performance optimizations, and future directions.

dbaplus Community
dbaplus Community
dbaplus Community
How Weibo Scales to Billions: Inside Its Multi‑Layer Cache Architecture

Data Volume and Latency Challenges

Weibo handles more than 1.6 billion daily active users and processes hundreds of billions of requests per day. Generating a user feed requires assembling thousands of items per request within tens of milliseconds.

Feed Platform Architecture

The platform is organized into five logical tiers:

Client layer (web, iOS, Android, open APIs).

Platform‑access layer that aggregates resources for core interfaces and provides elasticity during traffic spikes.

Service layer that runs feed algorithms and relationship logic.

Middle layer offering auxiliary services.

Storage layer.

When a user refreshes the feed, the system performs the following steps:

Retrieve the user's follow list (e.g., 1 000 followees).

Fetch recent posts from each followee.

Merge inbox messages (group posts, group‑chat posts, etc.).

Sort and filter the combined list according to user preferences.

Fetch the full post bodies for the selected IDs.

Enrich each post with metadata such as like, comment, and repost flags.

Return the final set (typically 10‑15 posts) to the client.

Cache Architecture

Six logical cache layers

Inbox – group‑specific and push‑based posts.

Outbox – regular user posts, split into multiple caches based on size (e.g., 200 items vs. 2 000 items).

Relationship – follow/follower mappings.

Content – raw post bodies.

Existence checks – binary flags such as “liked” or “read”.

Counters – comment, repost, like counts and user statistics.

Cache Evolution

Stage 1 – Simple KV on Memcached

Initial deployment stored data as hash‑sharded key‑value pairs in Memcached. Node failures caused cache‑to‑DB penetration, degrading latency.

Stage 2 – High‑Availability (HA) layer

An HA cache tier was added to absorb traffic when the main tier lost nodes, preventing DB overload.

Stage 3 – L1 “hot” cache

A small “L1” cache (≈1/6–1/10 of the main memory) was introduced. Hot requests are served from L1; misses fall back to the HA tier. Multiple L1 instances (4‑8) are deployed to handle traffic spikes.

Stage 4 – Relationship data on Redis

Follow lists (e.g., 2 000 IDs) were moved to Redis with hash distribution and read‑write separation. Active users stay in memory; inactive users are evicted and lazily re‑loaded.

Stage 5 – Custom long‑array for massive ID sets

Because Redis is single‑threaded, a custom open‑addressed long array using double‑hash addressing was built to store tens of thousands of IDs efficiently, reducing contention and latency.

Counter Service

Counting metrics (likes, comments, reposts) require compact storage. Direct use of Memcached or Redis wastes memory (≈65 B per KV). Weibo built a proprietary Counter Service with the following properties:

Memory usage 1/5–1/15 of Redis.

Hot‑cold separation: hot counters reside in RAM; cold counters are dumped to SSD.

Incremental persistence using RDB snapshots and rolling AOF files, enabling full‑incremental replication.

Auxiliary dictionary for overflow counters that exceed 4‑byte limits.

Storage hierarchy: multiple tables in memory, each covering a range of IDs. When a table exceeds memory, it is dumped to SSD and a new table is allocated.

Existence Checks – Phantom Bloom‑filter

Binary flags (e.g., “has the user read this post?”) are stored using a Bloom‑filter‑like structure called Phantom . Implementation details:

// Pseudocode
hashes = [h1(key), h2(key), h3(key)]
if all bits at hashes are 1:
    return MAY_EXIST   // false positive possible
else:
    return NOT_EXIST

Phantom uses a shared‑memory bitmap (~120 GB) and three independent hash functions, achieving high compression while keeping lookup latency in microseconds.

Serviceization and Operations

The cache stack is exposed as a service with centralized configuration management, providing:

Zero‑downtime configuration updates via a configuration service.

Automatic scaling of cache nodes based on load metrics.

Cluster Manager UI for health monitoring, SLA enforcement, and manual scaling.

Unified Redis‑compatible protocol for external access.

Summary

The multi‑layer cache design—KV → HA → L1 → Redis → Counter Service → Phantom—delivers high availability, low latency, and cost‑effective storage for billions of daily feed requests. Ongoing work focuses on further performance tuning, richer operational tooling, and extending the service‑oriented architecture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsCachestorageWeibo
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.