Backend Development 8 min read

How to Prevent Cache Avalanche and Thundering Herd in High‑Traffic Apps

This article examines various strategies—including cache warm‑up, staggered expirations, aggregated caching, queuing, locking, rate‑limiting, backup caches, client‑side caches, and default empty values—to mitigate cache breakdown and service avalanche during peak traffic periods.

Architecture & Thinking

Dec 9, 2022

How to Prevent Cache Avalanche and Thundering Herd in High‑Traffic Apps

1 Introduction

In a previous article we described cache avalanche, penetration, and thundering herd and their solutions; now we discuss handling cache breakdown and avalanche in specific business scenarios.

2 Problem Background

A core application (e.g., WeChat, DingTalk, Baidu APP) experiences peak QPS of millions.

Analysis: traffic peaks around 9–10 am, forming a Gaussian distribution.

The cache stores basic user information (name, gender, occupation, address) keyed by user ID.

For unknown reasons the cache is lost (expiration, failure, bug, restart).

During the peak, requests miss the cache and hit the database directly.

Disk‑based databases cannot handle the load, leading to a service avalanche.

4 Candidate Answers (Compiled)

4.1 Cache Warm‑up

Since peak periods are predictable, pre‑warm the cache before the peak (e.g., fill cache between 7–9 am for the 9–10 am peak).

Drawback: only works for predictable cache failures, not sudden loss during the peak.

4.2 Staggered Expiration Times

Uniform expiration causes many keys to expire simultaneously, triggering thundering herd. Apply a 3‑4‑3 distribution: Expire = 3 h + random()*4 h + 3 h, to stagger expirations.

Drawback: same limitation as 4.1; cannot handle unexpected failures.

4.3 Aggregated Cache by User Type

Instead of caching each user individually, group users by type and cache aggregated data, reducing database hits during peaks.

Only suitable for data with very low update frequency; large aggregated values that change frequently are inefficient.

4.4 Spike‑Reduction, Locking, Rate‑Limiting

4.4.1 Spike‑Reduction

Introduce a message queue to enqueue requests and process them sequentially, avoiding request bursts.

4.4.2 Locking

Allow only the first request for a given user to query the database and update the cache; subsequent requests wait for the lock to be released.

4.4.3 Rate‑Limiting

Conduct load testing without cache to determine the maximum sustainable load, then set a rate‑limit threshold to prevent overload.

Drawbacks:

Locks and queuing significantly reduce throughput, leading to long wait times and poor user experience.

Rate‑limiting is coarse‑grained; fine‑grained per‑endpoint limits are possible but still degrade service for some users.

Note: databases also provide rate‑limiting mechanisms.

4.5 Temporary Degradation with Backup Cache

If the primary cache fails, fall back to a backup cache that syncs asynchronously, accepting slight data staleness to protect the database from overload.

4.6 Temporary Degradation with Client‑Side Cache (Redis 6.0)

Leverage Redis 6.0 client‑side cache; similar trade‑off of freshness for reduced load on the cache service.

4.7 Temporary Degradation with Empty Default Value

Return an empty or default value during failure, sacrificing some requests to keep the database from being overwhelmed.

5 Summary

Each method has its own advantages and disadvantages; the appropriate solution should be chosen based on the actual application scenario.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

caching thundering herd cache-avalanche

Written by

Architecture & Thinking

🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1 Introduction

2 Problem Background

4 Candidate Answers (Compiled)

4.1 Cache Warm‑up

4.2 Staggered Expiration Times

4.3 Aggregated Cache by User Type

4.4 Spike‑Reduction, Locking, Rate‑Limiting

4.4.1 Spike‑Reduction

4.4.2 Locking

4.4.3 Rate‑Limiting

4.5 Temporary Degradation with Backup Cache

4.6 Temporary Degradation with Client‑Side Cache (Redis 6.0)

4.7 Temporary Degradation with Empty Default Value

5 Summary

Architecture & Thinking

How this landed with the community

Was this worth your time?

0 Comments

4.6 Temporary Degradation with Client‑Side Cache (Redis 6.0)