Backend Development 19 min read

Evolution of Weibo Cache Service: From Bare Memcache to Multi‑Layered Service Architecture

The article details how Weibo’s cache infrastructure progressed from simple Memcache deployments to a sophisticated, service‑oriented architecture featuring multi‑layer caching, proxy layers, dynamic configuration, monitoring, and automated scaling to meet massive read‑write demands and high availability requirements.

High Availability Architecture

Aug 26, 2016

Evolution of Weibo Cache Service: From Bare Memcache to Multi‑Layered Service Architecture

Weibo’s cache service has been a critical component of its high‑availability architecture, supporting millions of users and requiring sub‑millisecond response times with four‑nine availability. The article recounts the evolution of this service, beginning with a bare‑metal Memcache deployment and advancing through several architectural phases.

1. Cache Business Scenarios – Every Weibo API request is assembled in real time, often requiring hundreds of data items from backend resources, leading to massive read amplification. To meet the performance and availability demands, Weibo relies heavily on Memcache and Redis, with QPS for core cache ports reaching millions per second.

2. Evolution of the Bare‑Resource Architecture – Initially, Memcache nodes were deployed as isolated pools per IDC, accessed via hash‑based routing and local caches. Rapid growth forced the addition of hundreds of nodes, exposing failures that caused cache misses and DB overload. A two‑layer Main‑HA architecture was introduced to improve hit rates and resilience. Later, hot‑data overload prompted the addition of multiple small L1 caches, forming a three‑tier L1‑Main‑HA structure that alleviated bandwidth and CPU bottlenecks.

3. Design and Practice of Cache Service – The bare‑resource model suffered from configuration complexity, manual scaling, language‑specific client libraries, and operational difficulty. To address these issues, Weibo introduced a proxy layer based on Twitter’s twemproxy, added a Memcache cluster with built‑in routing policies, and employed namespace prefixes for multi‑tenant isolation. The proxy also incorporated an LRU cache to reduce hot‑key penetration.

Key operational improvements include:

Integration with an internal configuration center (vintage) for dynamic registration of Memcache resources and cache proxies.

Monitoring via Graphite, with logtailer forwarding cache logs and metrics for dashboards and alerts.

A web‑based management component (clusterManager/captain) that provides UI‑driven lifecycle management, capacity planning, and automated scaling.

Extended client support through a Motan‑based Memcache protocol, enabling language‑agnostic access via Spring configuration.

Two deployment models for cacheProxy: localized Docker containers alongside business services or centralized shared proxies.

Scaling strategies include intra‑cluster expansion (adding/removing L1 groups or expanding the Main layer) and inter‑cluster growth using an updateServer component to replicate data across master and slave clusters. The updateServer records writes to an AOF file and ensures consistency between clusters, while also supporting Docker‑based deployments.

Performance enhancements were achieved by request merging, moving from single‑process to multi‑process proxies, and upgrading the cache eviction algorithm from LRU to LS4LRU, which adds hierarchical expiration times and improves hit rates by 5‑7%.

LS4LRU Overview – LS4LRU extends S4LRU by assigning two expiration times (exp1, exp2) to each key, allowing fast responses for very recent data while asynchronously refreshing slightly older data, and discarding stale entries beyond exp2.

The article concludes with a Q&A covering L1‑Main interaction, hot‑data placement, Redis memory fragmentation, and operational practices such as disaster recovery, proxy failure handling, and automated configuration updates.

Related reading links and a download link for the presentation slides are provided at the end of the article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Cache Scalability high availability Service Architecture memcache

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.