Redis Distributed Lock Failure Causing Overselling and Safer Lock Solutions
This article analyzes a real‑world flash‑sale incident where a Redis distributed lock expired under high concurrency, leading to massive overselling, and presents safer lock implementations, atomic stock checks, and architectural improvements to prevent similar failures.
Using Redis for distributed locking is common, but a recent flash‑sale of a scarce product (100 bottles of premium liquor) resulted in a P0‑level incident where the stock was oversold by another 100 bottles.
The root cause was that the distributed lock had a 10‑second expiration; heavy user‑service load caused request latency to exceed this timeout, allowing the lock to expire while the original thread was still processing. When the lock finally released, it unintentionally removed the new lock, creating a loop where multiple threads could acquire the lock and bypass proper stock verification.
Three main problems were identified:
No fault‑tolerance or fallback handling for dependent services.
The distributed lock implementation was unsafe because lock release did not verify ownership.
Stock verification was non‑atomic, using a get‑and‑compare pattern.
To address these issues, a safer lock was implemented using a Lua script that atomically checks the lock value before deletion:
public void safedUnLock(String key, String val) {
String luaScript = "local in = ARGV[1] local curr=redis.call('get', KEYS[1]) if in==curr then redis.call('del', KEYS[1]) end return 'OK'";
RedisScript
redisScript = RedisScript.of(luaScript);
redisTemplate.execute(redisScript, Collections.singletonList(key), Collections.singleton(val));
}For stock deduction, the native atomic increment operation of Redis was used, eliminating the need for a separate lock:
Long currStock = redisTemplate.opsForHash().increment("key", "stock", -1);A new DistributedLocker class encapsulates lock acquisition and release, and the updated business logic now acquires the lock with a unique UUID, performs atomic stock decrement, and releases the lock via the Lua script.
Deep Thinking
Is a Distributed Lock Necessary?
Even though atomic stock decrement can prevent overselling, the lock still helps throttle traffic to downstream services, reducing load spikes and improving overall stability.
Lock Selection
RedLock offers higher reliability at the cost of performance; for this scenario, the simpler lock with value verification is more cost‑effective.
Further Optimizations
By distributing stock across cluster nodes and using a hash‑based routing at the gateway, inventory can be managed in‑process, eliminating Redis dependency and further boosting performance, though this adds complexity for dynamic scaling.
Conclusion
Overselling of scarce items is a critical failure that can damage a platform’s reputation. The incident highlights the importance of designing robust concurrency controls, using atomic operations, and continuously learning to improve system architecture.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.