How Leading Tech Companies Elegantly Avoid the Delayed Double Delete Pitfall
The article dissects why the delayed double‑delete cache‑consistency pattern breaks under high traffic, illustrates Alibaba’s painful experience, and then details two production‑grade alternatives—lease‑based token control and version‑number comparison—explaining their principles, Redis‑Lua implementations, and trade‑offs.
In many system designs the “delayed double delete” (DDD) has become the de‑facto solution for cache consistency: after updating the database, delete the cache, wait a few seconds, then delete it again. This works for small services but can cause cache breakdown and database overload in high‑traffic systems.
Why DDD was invented
The root problem is the time gap between a database write and the cache reflecting the new value. If a user reads the cache during this window, stale data is returned. Engineers therefore proposed deleting the cache twice—once immediately around the write and once after a 1–2 second delay—to cover most concurrent write‑read scenarios.
However, two major drawbacks appear:
The delay is hard to tune: too short fails to hide the race, too long creates inconsistency.
The second delete can trigger a cache‑miss storm, sending a burst of requests to the primary database and potentially causing a crash.
Fatal flaw: traffic‑induced cache breakdown
Assume a peak load of several thousand QPS. When DDD temporarily invalidates the cache, all those requests hit the database simultaneously, causing a sudden pressure spike. The database slows down, users refresh, traffic grows, and the primary database can avalanche. Alibaba encountered exactly this scenario and abandoned DDD for larger services.
Big‑company alternatives
1. Lease (token) mechanism
The lease approach adds a token that grants exclusive write rights to the cache. The workflow is:
Multiple requests query the cache and encounter a miss.
Redis returns a lease token; only the first requester keeps it.
The holder of the lease is allowed to write the new value into the cache.
Other requests either wait for the lease to expire or discard the result.
This prevents stale writes and acts like a lightweight write lock. A simple Redis‑Lua implementation uses three commands: lease:get – generates a token when the key does not exist. lease:set – verifies the token before permitting a cache write. lease:del – removes both the cache entry and its lease to avoid old writes flowing back.
All three operations are executed atomically in Lua scripts, ensuring safety.
2. Version‑number comparison mechanism
Each data item carries a version number (usually a timestamp). Before writing to the cache, the version stored in Redis is compared with the incoming version:
If new_version > old_version, the cache is updated.
Otherwise the write is discarded.
Implementation highlights:
Use mset in a Lua script to write both the version and the data atomically.
The application extracts the timestamp and passes it to Redis.
The script returns effect=true only when the version check passes.
This method is even lighter than the lease approach and scales to billions of QPS, which is why Alibaba prefers it for ultra‑large workloads.
3. Comparative analysis
Summarizing the three approaches:
Delayed double delete – Simple, suitable for small or cost‑sensitive services; risk of cache breakdown under high load.
Lease – Prevents concurrent stale writes; ideal for high‑concurrency distributed systems; more complex to implement.
Version comparison – No locks, highest performance; best for massive‑scale, high‑consistency requirements; depends on precise version control.
The author likens DDD to a “Swiss army knife for small teams,” while the lease and version mechanisms are “precision instruments used by internet giants.”
Conclusion
Each solution has its lifecycle:
DDD fits low‑traffic, cost‑sensitive scenarios.
Lease suits distributed high‑concurrency write workloads.
Version comparison is the ultimate choice for ultra‑large, high‑consistency systems.
Do not blindly copy big‑company patterns; they are tuned for "hundreds of millions of QPS," whereas most projects need a balance of stability, controllability, and acceptable cost.
True mature system design is not about achieving perfect consistency, but about finding the right trade‑off among consistency, performance, and cost.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
