How to Tackle Common Cache Problems in Distributed Systems
This article explores typical cache challenges in distributed systems—including data consistency, high availability, cache avalanche, and cache penetration—explaining their causes, real‑world scenarios, and practical mitigation strategies to ensure reliable and efficient caching.
Outline
Outline
Data Consistency
Cache High Availability
Cache Avalanche
Cache Penetration
Reference Materials
Summary
Data Consistency
Cache sits in front of persistent storage, keeping hot data close to users for faster access and lower latency.
Because cache is a replica of persisted data, inconsistencies can arise, leading to dirty reads or missing data, typically caused by network instability or node failures. Different operation orders produce various inconsistency scenarios.
2.1 Scenario Introduction
(1) Write cache first, then write database.
If the cache write succeeds but the database write fails or is delayed, subsequent concurrent reads from the cache may return dirty data.
(2) Write database first, then write cache.
If the database write succeeds but the cache write fails, subsequent reads may miss the data.
(3) Asynchronous cache refresh.
This scenario considers the timeliness of data writes and cache refreshes, such as how long to refresh the cache without affecting user access.
2.2 Solutions
Scenario 1: Writing the cache before persistence is incorrect; write to the persistent store first, then update the cache.
Scenario 2:
Rollback the database if cache write fails (adds complexity, not recommended).
If reading the cache fails, read from the database and then write back to the cache.
Scenario 3:
Identify which data suits asynchronous refresh.
Determine an acceptable inconsistency window based on experience and user‑visible refresh intervals.
2.3 Other Methods
Set reasonable timeout values.
Periodically refresh data within a defined range (by time or version).
In practice, consistency concerns appear at three levels: between cache and database, among multi‑level caches, and among cache replicas.
Cache High Availability
Industry opinions differ: some view cache as a temporary store that need not be highly available, while others treat it as a critical storage layer requiring high availability.
Whether cache must be highly available depends on the impact on the backend database.
Decision factors include cluster size, cost, and system performance metrics such as concurrency, throughput, and response time.
3.1 Solutions
High availability is typically achieved through distribution and replication. Distributed caching provides massive capacity; replication ensures node‑level availability.
Distribution often uses consistent hashing; replication can be asynchronous.
3.2 Other Methods
Dual‑write replication: both replicas must succeed before the operation is considered successful.
Virtual layer: add a virtual layer before the hash ring to handle ring failures and avoid data skew.
Multi‑level caching: e.g., local cache → distributed cache → distributed cache with local persistence.
Choose the approach based on specific business scenarios.
Cache Avalanche
An avalanche occurs when many cache entries expire simultaneously, flooding the database with requests and potentially overwhelming it.
Mitigation strategies include:
Plan cache expiration times wisely.
Assess database load capacity.
Implement overload protection or rate limiting at the application layer.
Design multi‑level caches to improve availability.
Cache Penetration
When a non‑existent key is repeatedly queried, each miss hits the database, causing unnecessary load.
Solutions:
Cache empty results temporarily and purge them when data becomes available.
Use a Bloom filter or bitmap to pre‑filter keys that are known to be absent.
Reference Materials
MemCache detailed analysis: http://www.mamicode.com/info-detail-1120932.html
Cache‑database consistency guarantees: http://www.36dsj.com/archives/43950
Hash ring and virtual nodes: http://www.111cn.net/sys/linux/58748.htm
Making memcached distributed: http://blog.csdn.net/cutesource/article/details/5848253
Summary
This session covered common cache issues—data consistency, high availability, cache avalanche, and cache penetration—providing practical insights and techniques to address each challenge in distributed systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
