How Weibo Scales Redis: Architecture, Optimizations, and Future Plans
This article details how Weibo leverages Redis across billions of requests, describing its massive scale, the challenges of trillion‑level reads/writes, the technical choices and customizations made—including LongSet, HA solutions, multi‑level caching, RocksDB integration—and outlines ongoing capacity and future development strategies.
Redis in Weibo: Application Scenarios
Redis is embedded in many Weibo features such as the "Red Packet" event, fan counts, read counts, likes, comment threading, ad recommendation, negative feedback handling, music charts, and more, serving both real‑time and batch workloads.
Scale and Challenges
Weibo operates over 100 TB of storage on more than 1,000 physical machines, running upwards of 10,000 Redis instances. The platform handles trillions of daily read/write operations with a target response time of around 20 ms (four‑nine availability). Memory cost is a major pressure due to the sheer volume of data.
Technical Selection
The overall database stack includes Redis, other NoSQL stores, queues, and persistent storage. The talk focuses on the Redis layer, which has been in production since 2010 (based on Redis 2.0) and has undergone extensive customizations.
Key Optimizations Implemented
Custom encoding reduces storage by ~30% in special scenarios.
Dedicated replication threads for master‑slave synchronization.
Introduced a fixed‑length open‑addressing hash array called LongSet to cut pointer overhead.
Independent replication thread plus full‑incremental copy enables quick recovery after network interruptions.
Full‑snapshot RDB combined with incremental AOF for persistence.
AOF write/flush handled by a BIO thread to avoid blocking the main thread.
Controlled cronsave timing for predictable persistence.
Adjusted AOF buffer size to prevent disk‑full situations under high write rates.
Developed an in‑house HA solution rather than relying on the official Redis HA.
Custom Counter Service
Standard Redis keys for counters (user‑id → numeric value) consumed excessive memory. A lightweight RedisCounter service was built, storing only a hash table of long values, reducing memory usage by up to 90% for simple counting use‑cases, though it initially supported only single‑column tables.
Multi‑Column & Multi‑Table Extension
To handle multiple counters (e.g., reposts, comments, likes) the service was extended to support multiple columns and tables, with overflow tables rolled to disk when full, making older data read‑only.
Cold‑Hot Data Separation
Hot data stays in memory while cold or historical data is persisted to disk using a custom phantom service built on BloomFilter and a RocksDB‑backed storage layer, achieving memory usage reductions of 75‑90% while keeping Redis‑like performance for hot keys.
Cache Service Architecture
A multi‑level cache was introduced with four roles: master , master‑l1 , slave , and slave‑l1 . Reads first hit master‑l1, then master, and finally slave, covering 99% of hot traffic before falling back to MySQL for the remaining 1%.
The service supports both native Redis protocol and a custom mc protocol, automatic scaling based on traffic, and a configuration center that pushes updates to clients.
Capacity‑Beyond‑Memory Solutions
To address the inability to keep all data in RAM, the team moved large‑capacity data to disk, supporting hot‑cold separation, full persistence, master‑slave replication, online hot upgrades, and compatibility with existing Redis data types.
Integration of RocksDB provides a mature, high‑performance on‑disk store, avoiding the need to reinvent storage engines.
Operational Enhancements
Non‑blocking persistence (RDB + AOF).
Online hot upgrades without downtime.
Custom graph relationships for complex queries.
Memory footprint reduced to 1/10‑1/4 of the original.
Performance gains of 3‑5× for certain workloads.
Remaining Challenges and Future Directions
Despite the improvements, new requirements still surface, prompting exploration of open‑source proxy solutions (Twemproxy, Codis, Corvus, Redis‑Cluster) and plans to adopt Raft‑based consensus for stronger consistency.
Future work includes deeper integration of SQL databases, further automation of scaling, and continued enhancement of high‑availability mechanisms.
Conclusion
Weibo’s experience demonstrates that with careful engineering—custom data structures, multi‑level caching, disk‑backed storage, and automated operations—Redis can serve at massive scale while controlling cost and maintaining performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
