Cache Penetration, Breakdown, and Avalanche: Causes and Solutions

This article explains the three major cache issues—penetration, breakdown, and avalanche—detailing their causes, such as malicious requests and simultaneous key expiration, and presents practical mitigation techniques including parameter validation, Bloom filters, caching empty values, locking, auto‑renewal, randomized TTLs, high‑availability setups, and graceful degradation.

Architect
Architect
Architect
Cache Penetration, Breakdown, and Avalanche: Causes and Solutions

Introduction

For backend developers, caching has become an indispensable technology in modern projects. While it can significantly improve system performance, improper use or lack of experience can introduce serious problems.

1. Cache Penetration

1.1 How We Use Cache

Typical workflow: a request first checks the cache; if the data exists, it is returned directly. If not, the database is queried, the result is stored in the cache, and then returned. If the database also lacks the data, the request fails.

The flowchart is well‑known and looks like this:

1.2 What Is Cache Penetration?

It occurs when a request’s ID does not exist in the cache and is also absent from the database, often caused by malicious users forging non‑existent IDs. Each such request forces a database query, bypassing the cache.

This situation is called the cache penetration problem. If many such requests hit the database simultaneously, the database may become overloaded and crash.

1.3 Parameter Validation

Validate user IDs before processing. For example, if legitimate IDs start with 15, reject any ID starting with 16 to filter out forged requests.

1.4 Bloom Filter

For small datasets, a Bloom filter can quickly determine whether a key might exist. It stores a bit array and sets bits for each key using multiple hash functions. When a request arrives, the same hash functions are applied; if all corresponding bits are 1, the key is assumed to exist.

Bloom filters introduce two issues: false positives (misjudgment) and data‑update synchronization problems.

1.5 Cache Empty Values

Instead of using a Bloom filter, a simpler solution is to cache empty results. When a key is not found in both cache and database, store a placeholder (e.g., null) in the cache. Subsequent requests retrieve the empty value directly, avoiding database hits.

2. Cache Breakdown

2.1 What Is Cache Breakdown?

When a hot key expires, a sudden surge of requests may all hit the database, causing a spike in load that can crash the database.

2.2 Locking

Use a distributed lock so that only one request can query the database for a given key at a time.

Pseudocode:

try {
  String result = jedis.set(productId, requestId, "NX", "PX", expireTime);
  if ("OK".equals(result)) {
    return queryProductFromDbById(productId);
  }
} finally {
  unlock(productId, requestId);
}
return null;

After retrieving the data, store it back into the cache.

2.3 Auto‑Renewal

Refresh keys before they expire using a scheduled job. For example, a job runs every 20 minutes to reset a 30‑minute cache, ensuring it never becomes stale.

The same technique applies to tokens with limited lifetimes: cache the token and periodically refresh it.

2.4 No Expiration for Hot Keys

For extremely hot keys (e.g., flash‑sale items), keep them in cache permanently and pre‑warm the cache before the event.

When the event ends, manually delete the cached data.

3. Cache Avalanche

3.1 What Is Cache Avalanche?

An avalanche occurs when many hot keys expire simultaneously or when the cache server goes down, causing a massive influx of database requests.

3.2 Randomized Expiration

Set each key’s TTL with an added random offset (e.g., 1–60 seconds) to avoid synchronized expiration.

actualTTL = baseTTL + random(1, 60) seconds

3.3 High Availability

Deploy Redis in Sentinel or cluster mode so that if a master node fails, a slave is promoted automatically.

3.4 Service Degradation

If the cache is still unavailable, enable a fallback mechanism: after a threshold of cache failures, switch to default data from a configuration center, and periodically attempt to restore normal cache access.

These strategies should be chosen based on specific business scenarios.

END

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.