How to Prevent Cache Breakdown, Penetration, and Avalanche in High‑Traffic Systems

The article explains why cache breakdown, penetration, and avalanche occur under high concurrency, analyzes their root causes such as key expiration and eviction, and provides practical mitigation techniques including distributed locking, Bloom filters, and staggered key updates to keep services stable.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
How to Prevent Cache Breakdown, Penetration, and Avalanche in High‑Traffic Systems

Cache Breakdown (Cache Stampede)

A cache breakdown occurs when a hot key expires and a large number of concurrent requests simultaneously miss the cache and query the database, overwhelming it. The two primary causes are:

Key expiration : a time‑based TTL expires at a specific moment (e.g., the start of a promotion).

Cache eviction : limited memory forces the cache to evict older entries, causing a hot key to disappear.

Distributed‑Lock Solution

When a request finds an expired key in Redis, it should acquire a distributed lock before accessing the database. A typical flow is:

Request reaches Redis and detects that the key is missing.

Check for an existing lock. If none exists, attempt to create one with SETNX (or SET key value NX PX ttl in newer Redis versions) to avoid race conditions.

If the lock is acquired, read the data from the database, populate the cache, return the response, and finally release the lock.

If the lock cannot be acquired, the request can either wait, retry after a short sleep, or return a stale cached value.

To prevent deadlocks when the process holding the lock crashes, set an expiration time on the lock (e.g., PX 3000 for 3 seconds). If the lock expires before the data is fetched, subsequent requests will acquire a new lock. Advanced strategies include:

Dynamically extending the lock TTL while the data fetch is in progress.

Running a watchdog thread that monitors the lock and refreshes its expiration as needed.

Cache Penetration

Penetration happens when requests query keys that do not exist in the database. Because such keys cannot be cached, every request hits the database directly, potentially causing a denial‑of‑service scenario.

Mitigation techniques:

Place a probabilistic filter (e.g., Bloom filter, enhanced Bloom filter, or Cuckoo filter) in front of the cache lookup. The filter quickly rejects requests for non‑existent keys.

Validate request parameters (e.g., reject negative IDs or IDs that violate known monotonicity).

Perform user‑authenticity checks to filter obviously malicious traffic.

Reference implementation details can be found in Redis modules such as RedisBloom (GitHub: https://github.com/RedisBloom/RedisBloom).

Cache Avalanche

An avalanche is similar to a breakdown but involves a massive number of hot keys expiring at the same moment, often triggered by a time‑sensitive event (e.g., a promotional campaign). Randomized TTLs are only effective when expirations are not tied to a specific point in time.

Recommended strategies:

If the expiration is not time‑specific, assign each key a random offset (e.g., add a random 0‑300 seconds to the base TTL) to spread load.

If the expiration is time‑specific, proactively refresh the affected keys shortly before the scheduled expiry.

Background workers can pre‑populate the cache with fresh data.

Incoming requests can be delayed briefly (e.g., Thread.sleep(10) ms) to smooth the traffic spike.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cacheredishigh concurrencycache-avalanchecache-breakdowncache-penetration
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.