How to Prevent Cache Avalanche in Distributed Systems: Strategies and Best Practices
This article explains what a cache avalanche is, why it occurs in distributed systems, and presents four practical mitigation techniques—including randomized TTL, mutex locks, proactive pre‑warming, and rate‑limiting/circuit‑breaking—to keep backend services stable under sudden load spikes.
Cache Avalanche Overview
Cache avalanche is a severe failure mode in distributed systems where a large number of cached entries expire at the same moment or the cache becomes unavailable. The sudden loss of cache protection causes a massive burst of traffic to hit the underlying database or downstream services, often overwhelming them.
Typical Causes
Common triggers include:
Massive simultaneous expiration of high‑frequency keys (e.g., hot ranking lists, best‑selling items).
Cache server outage that forces all requests to bypass the cache.
When many requests attempt to rebuild the same hot data, the cache‑miss traffic spikes and directly loads the database.
Example: an e‑commerce homepage caches the list of hot products with a TTL of one hour aligned to the clock. At 10:00 the cache expires, and all users query MySQL at once, causing a sharp load increase.
Mitigation Strategies
Staggered Expiration (Randomized TTL) Assign each cache entry a TTL that includes a random jitter so that expirations are spread over a time window. A typical implementation is:
baseTTL = 3600 // seconds
jitter = random(-0.1, 0.1) * baseTTL
ttl = baseTTL + jitterThe jitter range should be chosen based on business access patterns; too large a range may increase the window of stale data, while too small a range does not sufficiently disperse expirations.
Mutex or Request Queue (Cache‑Penetration Protection) When a cache miss occurs, use a distributed lock to ensure that only one request fetches the data from the source while other concurrent requests wait or receive stale data. Example using Redis:
lockKey = "lock:" + key
if redis.setnx(lockKey, "1") == 1:
redis.expire(lockKey, lockTimeout)
data = loadFromDB(key)
cache.set(key, data, ttl)
redis.del(lockKey)
else:
// wait briefly and retry cache read
sleep(50ms)
data = cache.get(key)Lock timeout must be shorter than the data fetch time to avoid deadlocks, and the implementation should handle lock acquisition failures gracefully.
Pre‑warming and Proactive Refresh (Lossless Update) For hot data, load it into the cache before it expires and refresh it asynchronously. Two common patterns:
Background refresh : a scheduled job reads the source and updates the cache a few seconds before the current TTL expires.
Cache‑then‑source : write updates to the cache first, then persist to the database, ensuring the cache remains warm.
When designing the refresh job, limit concurrency (e.g., a semaphore) to avoid a burst of simultaneous source reads.
Rate Limiting, Degradation, and Circuit Breaking (Backend Protection) Apply traffic‑shaping mechanisms when backend load rises:
Rate limiting (token‑bucket or leaky‑bucket) to cap the number of requests forwarded to the database.
Circuit breaker that opens after a failure threshold, returning a fallback response (static page, cached stale data, or a friendly error).
Graceful degradation that serves reduced‑feature responses instead of full data when resources are constrained.
These controls preserve core functionality and prevent total system collapse during an avalanche.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect Chen
Sharing over a decade of architecture experience from Baidu, Alibaba, and Tencent.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
