How to Prevent Redis Cache Avalanche, Penetration, and Breakdown

This article explains the concepts of Redis cache avalanche, penetration, and breakdown, illustrates real‑world incidents, and provides pre‑, during‑, and post‑failure strategies such as high‑availability setups, local caches with rate limiting, and defensive caching of empty values.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Prevent Redis Cache Avalanche, Penetration, and Breakdown

Interview Questions

What are Redis avalanche, penetration, and breakdown? What happens when Redis crashes? How should a system respond? How to handle Redis penetration?

Interviewer Psychology Analysis

These are common cache interview questions because avalanche and penetration are the two biggest cache issues; if they occur, they are fatal, so interviewers are sure to ask about them.

Interview Question Analysis

Cache Avalanche

Assume system A receives 5,000 requests per second during peak, while the cache can handle 4,000. If the cache server crashes completely, all 5,000 requests hit the database, which cannot sustain the load, leading to a database crash. Without special fault‑tolerance measures, the DBA restarts the database, but it is immediately overwhelmed again.

This situation is called a cache avalanche.

About three years ago, a well‑known Chinese internet company suffered a cache incident that caused a full avalanche, crashing all backend systems and resulting in losses of tens of millions of yuan.

Solutions before, during, and after a cache avalanche:

Pre‑incident: Use Redis high‑availability (master‑slave + Sentinel, Redis Cluster) to avoid total crash.

During incident: Deploy local Ehcache + Hystrix for rate limiting and fallback to protect MySQL.

Post‑incident: Enable Redis persistence so that after a restart it automatically reloads data from disk.

When a request arrives, system A first checks the local Ehcache; if missed, it checks Redis; if still missed, it queries the database, then writes the result back to both Ehcache and Redis.

Rate‑limiting components can restrict the number of requests per second; excess requests are downgraded, returning default values, friendly messages, or empty responses.

Benefits:

The database will never die because the rate‑limiting component ensures only a limited number of requests reach it.

As long as the database stays alive, at least a portion of user requests (e.g., 2/5) can be processed.

If 2/5 of requests are handled, the system is considered alive; users may need to retry a few times before a page loads.

Cache Penetration

Assume system A receives 5,000 requests per second, of which 4,000 are malicious attacks from hackers.

These attacks query keys that do not exist in the cache, forcing a database lookup each time, which also returns nothing.

Example: Database IDs start from 1, but attackers send negative IDs. Since such keys are absent in the cache, each request bypasses the cache and hits the database, potentially overwhelming it.

Simple solution: when a database lookup returns no result, write a placeholder (e.g., set key to -999 with value UNKNOWN) into the cache with an expiration time, so subsequent identical requests hit the cache instead of the database.

Cache Breakdown

Cache breakdown occurs when a hot key expires at a moment of high concurrency, causing a massive surge of requests to bypass the cache and hit the database, effectively creating a hole in the protection layer.

Possible mitigation strategies:

If the cached data rarely changes, set the hot key to never expire.

If updates are infrequent and cache refresh is quick, use distributed locks (Redis, Zookeeper) or local mutexes to ensure only a few requests rebuild the cache while others wait for the new cache.

If updates are frequent or cache rebuild is slow, proactively refresh the cache before expiration using a scheduled thread or extend the expiration time.

Reference: https://github.com/doocs/advanced-java

(© Original author, please delete if infringing)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancerediscache-avalanchecache-breakdowncache-penetration
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.