Backend Development 15 min read

Cache Design and Optimization in High‑Concurrency Distributed Systems

This article explains the role of caching in high‑concurrency distributed systems, detailing its benefits, costs, various update strategies, and advanced optimizations such as penetration protection, bottom‑hole mitigation, avalanche prevention, and hot‑key rebuild handling.

Architect

Feb 27, 2022

Cache Design and Optimization in High‑Concurrency Distributed Systems

Preface

In high‑concurrency distributed systems, caching is indispensable for accelerating reads/writes and shielding the backend from massive request bursts. This article discusses cache design, its benefits and costs, and practical solutions to common problems.

Benefits and Costs of Caching

Benefits include:

Accelerated read/write performance (e.g., Redis or Memcached can achieve tens of thousands of QPS compared to a few thousand for MySQL).

Reduced backend load by caching expensive computations or results.

Costs include:

Data inconsistency due to cache‑storage time windows.

Increased code maintenance effort.

Operational overhead for high‑availability setups (master‑slave, clustering).

When benefits outweigh costs, caching should be adopted.

Cache Update Strategies

Cache entries usually have a TTL; when they expire they are reloaded. Common update strategies:

1. LRU/LFU/FIFO – eviction algorithms used when the cache is full. LRU removes the least recently used entry, LFU the least frequently used, FIFO the oldest.

2. Timeout Expiration – set an explicit expiration time (e.g., Redis EXPIRE command). Consistency depends on the TTL; real‑time consistency is not guaranteed.

3. Active Update – proactively refresh the cache when the underlying data changes. Provides the highest consistency but couples business updates with cache updates, often requiring a message queue.

Best practice:

Low‑consistency workloads: combine strategy 1 with strategy 2.

High‑consistency workloads: combine strategies 2 and 3.

Penetration Optimization

Cache penetration occurs when requests query non‑existent data, causing both cache and storage misses. Mitigation approaches:

1. Cache Empty Objects – store a placeholder for missing keys, optionally with a short TTL, and filter out requests outside the valid ID range.

2. Bloom Filter – a space‑efficient probabilistic data structure that quickly determines if a key is likely absent, reducing unnecessary storage hits.

Combining both methods yields effective protection.

Bottom‑Hole Optimization

The “bottom‑hole” problem arises when distributed cache clusters grow large, leading to excessive network I/O for batch operations (e.g., MGET). Solutions include:

Avoid batch operations when possible.

Isolate clusters per project/team.

Use hash‑tagging in Redis to force related keys onto the same node, reducing cross‑node requests.

Four batch‑operation approaches (serial, node‑aware serial, parallel, hash‑tag) are compared, with parallel I/O and hash‑tag offering the best performance.

Avalanche Optimization

Cache avalanche happens when the cache becomes unavailable, flooding the storage layer. Prevention measures:

Ensure high availability (e.g., master‑slave, Redis Sentinel).

Use circuit‑breaker or rate‑limiting components (e.g., Netflix Hystrix) to isolate failures.

Isolate resources per project to contain faults.

Hot‑Key Rebuild Optimization

When a hot key expires, many threads may simultaneously rebuild the cache, overwhelming the backend. Mitigation strategies:

Mutex lock – allow only one thread to rebuild while others wait or serve stale data.

Never‑expire – update the cache asynchronously via scheduled jobs or active pushes.

Backend rate limiting – limit the number of rebuild requests reaching the backend.

Combining these techniques helps maintain stability under high concurrency.

Conclusion – Effective cache design balances performance gains with consistency and operational costs, employing appropriate eviction, expiration, and advanced protection mechanisms to ensure reliable high‑throughput services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Cache Avalanche Cache Eviction Hot Key

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.