How to Ensure Cache‑Database Consistency: Strategies, Pitfalls & Best Practices
This article explains why caching improves performance, examines the trade‑offs between cache utilization and data consistency, analyzes concurrency‑induced inconsistency scenarios, compares update‑then‑delete versus delete‑then‑update approaches, and recommends asynchronous retry with message queues or binlog subscription to reliably keep cache and database in sync.
1. Why Introduce a Cache
When request volume grows, reading directly from the database becomes a bottleneck. Adding a high‑performance cache (e.g., Redis) moves most reads to the cache, dramatically improving latency.
Typical deployment changes from a single‑node DB to a DB‑plus‑cache architecture.
2. Basic Cache‑Population Strategies
Full load : Load all data into the cache without TTL, write requests only update the DB, and a scheduled job periodically refreshes the cache.
Read‑through (TTL‑based) : Write requests update only the DB. On a cache miss, the application reads from the DB, writes the result into the cache, and sets an expiration time.
Read‑through improves hit ratio but still suffers from consistency gaps because the cache and DB can diverge when updates are not reflected in the cache.
3. Two‑Step Update Orders and Their Pitfalls
When an update must affect both DB and cache, two ordering options exist:
Update cache first, then DB.
Update DB first, then cache.
If the second step fails, the system ends up with stale data in either the cache or the DB.
4. Concurrency‑Induced Inconsistency
Concurrent updates to the same record can interleave the two steps, producing divergent states. Example (DB‑first order):
Thread A: UPDATE DB SET X=1;
Thread B: UPDATE DB SET X=2;
Thread B: UPDATE CACHE X=2;
Thread A: UPDATE CACHE X=1;
// DB = 2, CACHE = 1Both “cache‑first” and “DB‑first” orders are vulnerable to this race condition.
5. Delete‑Then‑Update Variants
Delete cache, then update DB.
Update DB, then delete cache.
If the second step fails, the cache and DB become inconsistent, so simple deletion does not solve the problem.
6. Guaranteeing Both Steps Succeed
The simplest remedy is a retry, but synchronous retries block the request thread and can exhaust resources. A more robust solution is asynchronous retry :
Push a retry task into a reliable message queue.
A dedicated consumer repeatedly processes the task until it succeeds.
Message queues provide durability (messages survive process restarts) and at‑least‑once delivery, decoupling retry logic from the request flow.
7. Master‑Slave Lag and Delayed Double Delete
In read‑write‑split architectures, replication lag can cause a stale value to be written back to the cache. The common mitigation is the delayed double‑delete strategy:
Delete the cache and update the DB.
Pause for a short interval (typically 1–5 seconds) to allow replication to catch up.
Delete the cache a second time (or send a delayed delete message).
The delay must exceed both the replication lag and the time a concurrent read‑write thread might read the old value and write it back to the cache.
8. Strong Consistency vs. Performance
Achieving strong consistency requires protocols such as 2PC, 3PC, Paxos, or Raft, which add significant latency and complexity. Because caching is primarily introduced for performance, most systems accept eventual consistency and focus on narrowing the inconsistency window.
9. Practical Recommendation
Prefer the DB‑first → delete‑cache pattern. Updating the DB first guarantees the authoritative state; deleting the cache forces the next read to fetch the fresh value.
Combine this pattern with an asynchronous retry mechanism (message queue) or a binlog‑subscription system (e.g., Alibaba Canal) to guarantee the delete step eventually succeeds.
When using read‑write splitting, mitigate master‑slave lag with delayed double delete and keep replication delay as low as possible.
Set appropriate TTLs for cache entries to maximize cache utilization and automatically evict cold data.
These techniques together provide a pragmatic balance between high read performance and acceptable data consistency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
