Cache Design and Optimization Strategies for High‑Concurrency Distributed Systems
The article explains why caching is essential in high‑concurrency distributed systems, analyzes its benefits and costs, and then details various cache update, penetration, no‑hole, avalanche, and hot‑key rebuilding techniques, offering practical guidance for backend engineers.
Introduction – In high‑concurrency distributed systems, caching is indispensable for accelerating reads and shielding the backend from massive request bursts; proper cache design is therefore a critical component.
Cache Benefits and Costs – Benefits include accelerated read/write performance (e.g., Redis handling tens of thousands of QPS versus MySQL’s few thousand) and reduced backend load. Costs involve data inconsistency windows, added code‑maintenance effort, and operational overhead such as setting up master‑slave clusters for high availability.
Cache Update Strategies
LRU/LFU/FIFO – Eviction algorithms used when cache space is exhausted; LRU removes least‑recently used items, LFU removes least‑frequently used, FIFO follows insertion order. Consistency is limited, but implementation cost is low.
Expiration (Timeout Eviction) – Assign a TTL (e.g., Redis EXPIRE ) to cached data; after the TTL the entry is refreshed from the source. Consistency depends on the TTL length.
Active Update – When the source data changes, the cache is proactively refreshed, offering the highest consistency at the expense of higher development and operational complexity.
Penetration Optimization
Cache Empty Objects – Store a placeholder for non‑existent keys to prevent repeated DB hits; combine with business‑level filtering and short TTLs.
Bloom Filter – Use a space‑efficient probabilistic data structure to pre‑filter requests; suitable for large key spaces and can be combined with empty‑object caching.
No‑Hole Optimization – Describes the “no‑hole” problem observed in large memcached clusters where batch operations cause many network round‑trips. Solutions include reducing batch size, isolating clusters per project, and ensuring traffic does not approach Facebook‑scale levels.
Avalanche Optimization – Prevents a sudden cache outage from overwhelming the datastore by ensuring high availability (e.g., Redis Sentinel), employing circuit‑breaker/limiters (e.g., Netflix Hystrix), and isolating resources per project.
Hot‑Key Rebuild Optimization
Mutex Lock – Allow only one thread to rebuild a hot key while others wait or serve stale data.
Never‑Expire Strategy – Update cache via scheduled jobs or push updates from the source.
Backend Rate Limiting – Limit the number of rebuild attempts to protect the backend, assuming hot keys can be identified.
Conclusion – Summarizes practical cache design experiences, emphasizing that when benefits outweigh costs, appropriate eviction, update, and protection strategies should be selected based on consistency requirements and traffic characteristics.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.