Operations 22 min read

How to Prevent and Recover from Cache‑Induced Service Overload

Service overload caused by cache failures can cripple dependent systems, but by adopting smart cache get patterns, proactive client‑side checks, traffic throttling, service degradation, and dynamic scaling, developers can both prevent overload and recover gracefully when it occurs.

Meituan Technology Team

Jun 17, 2016

How to Prevent and Recover from Cache‑Induced Service Overload

In modern distributed systems, a sudden surge of external requests can overload services, especially when a cache layer fails, leading to request piling, service unavailability, and eventual system crash. This article analyzes how introducing cache can unintentionally cause overload and presents preventive and remedial strategies.

Service Overload Scenario

The example involves two systems: client A (a 60‑node cluster) and server B (a 6‑node cluster). A relies on B’s read service but first queries a cache; only when the cache entry expires does A call B. If the cache fails, all traffic is redirected to B, overwhelming it.

Three primary causes of overload are identified:

The front‑end proxy of B fails, making B temporarily unavailable; when B recovers, the accumulated traffic spikes.

The cache itself fails, causing all of A’s traffic to flow directly to B.

Cache recovery occurs while the cache is empty, resulting in a cache‑penetration (hit rate 0) and a sudden surge to B.

Client‑Side Prevention Strategies

From A’s perspective, several approaches can mitigate overload:

1. Reasonable Cache Usage for B Failures

Each cache entry has a TTL (T). After expiration, five possible get‑operation patterns exist.

Simple timeout mode (stupid): Every thread that finds an expired key immediately calls the remote service, leading to many concurrent requests.

Regular timeout mode: Threads first check whether another thread is already fetching the value; if so, they wait, reducing concurrent remote calls.

Simple refresh mode: After expiration, a thread triggers a refresh; other threads either wait for the refresh (synchronous) or return the stale value immediately (asynchronous).

Regular refresh mode: Similar to the simple refresh mode but with coordination so that only one refresh is performed while others return stale data.

Refresh‑renew mode: If a refresh fails, the stale value is treated as fresh for another TTL period, preventing a sudden traffic burst.

Guava’s local cache supports the regular timeout, regular refresh, and refresh‑renew modes. Distributed caches such as Redis only provide basic get/set operations, so developers must implement these patterns themselves.

2. Handling Distributed Cache Failures

If the distributed cache is down, A cannot read or write to it, and all traffic falls back to B. Three mitigation ideas are proposed:

Log the failure and return a default value without contacting B.

Probabilistically decide whether to call B, using a probability u = (average B traffic) / (peak A traffic).

Query a health‑check endpoint on B; only call B if it reports healthy.

Option 2 is the most practical, while option 3 requires a dedicated health‑assessment service.

3. Recovery After Cache Outage

When the cache comes back online, it is empty, so A must again decide whether to request B or wait. If the key space is small, the impact is limited; otherwise, overload may recur. Proper monitoring is essential to avoid serving stale data.

Server‑Side Overload Protection

Server B can employ three main techniques:

Traffic Control

Real‑time monitoring of request rates and rejecting excess traffic when a predefined threshold is exceeded. Two variants exist:

Threshold‑based control: a static limit per host.

Host‑status‑based control: reject requests if the host’s health metrics (CPU, memory, GC, latency, etc.) indicate overload.

Implementation can be placed in reverse proxies (e.g., Nginx), service‑governance platforms, or directly in application code, though the latter mixes concerns and is less recommended.

Service Degradation

When overload occurs, non‑critical APIs are disabled while critical ones remain available, effectively shifting processing capacity to essential services.

Dynamic Scaling

Automatically expand the cluster when traffic exceeds capacity and shrink it back after the surge, achieving elastic resource usage. This requires robust cloud‑native orchestration.

Crash Recovery

If overload leads to a crash, operators should gradually ramp up traffic (e.g., 10 % → 20 % → 50 % → 80 % → 100 %) while monitoring system health, allowing the cache to warm up safely.

Conclusion

Prevention is the primary goal, with remediation as a backup. Key recommendations:

Clients should use the asynchronous refresh‑renew cache pattern and avoid simple timeout or simple refresh modes.

Clients must check cache availability and, when unavailable, access the backend with a conservative probability.

Servers should enforce traffic control at the reverse‑proxy level, with thresholds derived from load‑testing.

Effective overload handling combines careful cache strategy, intelligent client behavior, and robust server‑side protection, supported by coordinated efforts between developers and operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Cache Operations System Design traffic control service overload

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.