Why Redis Distributed Locks Can Fail: A Real-World Flash Sale Disaster and Fix

An in‑depth post examines a P0 flash‑sale incident where a Redis‑based distributed lock caused severe overselling, analyzes root causes such as lock expiration and non‑atomic stock checks, and presents safer lock release via Lua scripts and alternative designs to prevent similar failures.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Why Redis Distributed Locks Can Fail: A Real-World Flash Sale Disaster and Fix

Background

The team built a flash‑sale (seckill) system that used a Redis distributed lock to protect the order‑creation flow. During a high‑profile "Flying Moutai" sale with only 100 bottles in stock, the system oversold dramatically, triggering a P0 incident and performance penalties.

Incident Details

When the sale started, a flood of user‑validation requests hit the user‑service, causing the gateway to delay responses. Some requests exceeded the 10‑second lock timeout, allowing the lock to expire while the business logic was still running. Subsequent requests acquired the stale lock, and the original thread later released the lock, unintentionally unlocking the newer holder. This created a race condition where multiple threads believed they owned the lock.

Because the stock check was performed with a non‑atomic GET‑then‑compare pattern, the overselling occurred once the lock safety was compromised.

Root‑Cause Analysis

No fallback or circuit‑breaker for the overloaded user service; gateway delays directly led to lock expiration.

The distributed lock implementation relied solely on SETNX with a fixed TTL, so a long‑running thread could lose the lock and later release a lock it no longer owned.

Stock verification was not atomic, making it vulnerable to concurrent modifications.

Solution Overview

To address the issues, the article proposes three main improvements:

Safer Distributed Lock : Store a unique value (e.g., a UUID) with the lock key and release the lock only if the stored value matches. This is achieved with a Lua script that atomically checks the value and deletes the key.

Atomic Stock Decrement : Leverage Redis' HINCRBY (or INCRBY) to decrement stock in a single atomic operation, eliminating the need for a separate lock around stock checks.

Refactored Business Logic : Introduce a DistributedLocker utility class that encapsulates lock acquisition and safe release, and rewrite the seckill handler to use the new lock and atomic stock decrement.

Lua Script for Safe Unlock

local val = ARGV[1]
local curr = redis.call('get', KEYS[1])
if val == curr then
  redis.call('del', KEYS[1])
end
return 'OK'

Refactored Seckill Handler (Java)

public SeckillActivityRequestVO seckillHandle(SeckillActivityRequestVO request) {
    SeckillActivityRequestVO response;
    String key = "key:" + request.getSeckillId();
    String val = UUID.randomUUID().toString();
    try {
        boolean lockAcquired = distributedLocker.lock(key, val, 10, TimeUnit.SECONDS);
        if (!lockAcquired) {
            // business exception
        }
        // user validation omitted for brevity
        Long currStock = stringRedisTemplate.opsForHash()
            .increment(key + ":info", "stock", -1);
        if (currStock < 0) {
            // out of stock handling
        } else {
            // generate order, publish event, build response
        }
    } finally {
        distributedLocker.safedUnLock(key, val);
    }
    return response;
}

Deep‑Dive Thoughts

While Redis atomic decrement can replace the lock for simple stock‑deduction, the lock still helps throttle traffic to downstream services, reducing overall system load.

RedLock offers stronger safety guarantees at the cost of performance; it may be justified for ultra‑critical scenarios.

After deploying the improved code and conducting load tests, the team observed higher throughput and no overselling, confirming the fix.

Further Optimizations

The article suggests distributing stock across cluster nodes using a hash‑based routing layer and broadcasting stock updates, which can further reduce Redis pressure. However, this approach adds complexity around dynamic scaling.

Conclusion

Overselling of scarce items can cause severe business and reputational damage. A thorough analysis revealed that an unsafe distributed lock and non‑atomic stock checks were the culprits. By adopting a value‑based Lua unlock, leveraging Redis' atomic operations, and refactoring the code, the system regained correctness and improved performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Javaconcurrencyredisspringdistributed-lockLuaflash sale
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.