Cache Breakdown, Penetration, and Avalanche: Causes and Mitigation Strategies in Redis
The article explains why cache breakdown, penetration, and avalanche occur in high‑concurrency Redis environments, and presents practical mitigation techniques such as distributed locking, Bloom‑filter filtering, and careful expiration strategies to keep backend services stable.
One of the main bottlenecks in computer systems is I/O; to bridge the speed gap between memory and disk, caches store hot data in memory, reducing direct database access. However, when a hot key expires under high concurrency, cache breakdown can overload the database.
Breakdown happens mainly for two reasons: the key expires (e.g., after a promotional event) and the key is evicted by the cache's replacement policy because memory is limited.
To handle breakdown, a common approach is to set a lock when a request finds an expired key. The process is roughly:
Request reaches Redis, discovers the key is expired, checks for a lock; if none, it re‑queues.
Set a lock using setnx() (not plain set ) to avoid race conditions.
Acquire the lock, fetch data from the database, return the response, and release the lock.
If the process that holds the lock crashes, the lock may remain, causing other requests to wait indefinitely. A typical solution is to give the lock an expiration time, but this raises the problem of lock timeout. One remedy is to monitor the data‑fetching thread and extend the lock’s TTL if needed.
Cache penetration occurs when many requests query data that does not exist in the database, bypassing the cache and hitting the database directly. Adding a filter layer such as a Bloom filter, an enhanced Bloom filter, or a Cuckoo filter can block these invalid requests.
Cache avalanche is similar to breakdown but involves a large number of hot keys expiring simultaneously. Randomized expiration times are often suggested, but they are not always appropriate—for example, when a business rule changes at a specific time. The correct approach is to first determine whether the expiration is time‑dependent; if not, random TTL can help. If it is time‑dependent, use a strong‑dependency strategy: update all related keys in a coordinated way while temporarily throttling incoming requests.
Overall, the article provides a comprehensive view of cache‑related problems in Redis and offers practical solutions to keep backend services reliable under heavy load.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.