How Weibo Scales to Billions: Inside Its Multi‑Layer Cache Architecture
This article explains how Weibo handles massive daily traffic of over a hundred billion requests by employing a five‑layer feed system and a six‑layer cache architecture that evolved from simple KV storage to sophisticated counter and existence‑check services, highlighting design choices, performance optimizations, and future directions.
Data Volume and Latency Challenges
Weibo handles more than 1.6 billion daily active users and processes hundreds of billions of requests per day. Generating a user feed requires assembling thousands of items per request within tens of milliseconds.
Feed Platform Architecture
The platform is organized into five logical tiers:
Client layer (web, iOS, Android, open APIs).
Platform‑access layer that aggregates resources for core interfaces and provides elasticity during traffic spikes.
Service layer that runs feed algorithms and relationship logic.
Middle layer offering auxiliary services.
Storage layer.
When a user refreshes the feed, the system performs the following steps:
Retrieve the user's follow list (e.g., 1 000 followees).
Fetch recent posts from each followee.
Merge inbox messages (group posts, group‑chat posts, etc.).
Sort and filter the combined list according to user preferences.
Fetch the full post bodies for the selected IDs.
Enrich each post with metadata such as like, comment, and repost flags.
Return the final set (typically 10‑15 posts) to the client.
Cache Architecture
Six logical cache layers
Inbox – group‑specific and push‑based posts.
Outbox – regular user posts, split into multiple caches based on size (e.g., 200 items vs. 2 000 items).
Relationship – follow/follower mappings.
Content – raw post bodies.
Existence checks – binary flags such as “liked” or “read”.
Counters – comment, repost, like counts and user statistics.
Cache Evolution
Stage 1 – Simple KV on Memcached
Initial deployment stored data as hash‑sharded key‑value pairs in Memcached. Node failures caused cache‑to‑DB penetration, degrading latency.
Stage 2 – High‑Availability (HA) layer
An HA cache tier was added to absorb traffic when the main tier lost nodes, preventing DB overload.
Stage 3 – L1 “hot” cache
A small “L1” cache (≈1/6–1/10 of the main memory) was introduced. Hot requests are served from L1; misses fall back to the HA tier. Multiple L1 instances (4‑8) are deployed to handle traffic spikes.
Stage 4 – Relationship data on Redis
Follow lists (e.g., 2 000 IDs) were moved to Redis with hash distribution and read‑write separation. Active users stay in memory; inactive users are evicted and lazily re‑loaded.
Stage 5 – Custom long‑array for massive ID sets
Because Redis is single‑threaded, a custom open‑addressed long array using double‑hash addressing was built to store tens of thousands of IDs efficiently, reducing contention and latency.
Counter Service
Counting metrics (likes, comments, reposts) require compact storage. Direct use of Memcached or Redis wastes memory (≈65 B per KV). Weibo built a proprietary Counter Service with the following properties:
Memory usage 1/5–1/15 of Redis.
Hot‑cold separation: hot counters reside in RAM; cold counters are dumped to SSD.
Incremental persistence using RDB snapshots and rolling AOF files, enabling full‑incremental replication.
Auxiliary dictionary for overflow counters that exceed 4‑byte limits.
Storage hierarchy: multiple tables in memory, each covering a range of IDs. When a table exceeds memory, it is dumped to SSD and a new table is allocated.
Existence Checks – Phantom Bloom‑filter
Binary flags (e.g., “has the user read this post?”) are stored using a Bloom‑filter‑like structure called Phantom . Implementation details:
// Pseudocode
hashes = [h1(key), h2(key), h3(key)]
if all bits at hashes are 1:
return MAY_EXIST // false positive possible
else:
return NOT_EXISTPhantom uses a shared‑memory bitmap (~120 GB) and three independent hash functions, achieving high compression while keeping lookup latency in microseconds.
Serviceization and Operations
The cache stack is exposed as a service with centralized configuration management, providing:
Zero‑downtime configuration updates via a configuration service.
Automatic scaling of cache nodes based on load metrics.
Cluster Manager UI for health monitoring, SLA enforcement, and manual scaling.
Unified Redis‑compatible protocol for external access.
Summary
The multi‑layer cache design—KV → HA → L1 → Redis → Counter Service → Phantom—delivers high availability, low latency, and cost‑effective storage for billions of daily feed requests. Ongoing work focuses on further performance tuning, richer operational tooling, and extending the service‑oriented architecture.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
