Why Can a Redis Lock Be Stealed by Another Thread Before Its TTL Expires?
In high‑concurrency Java services, a Redis lock that appears to have a valid TTL can still be taken by another thread when a long Stop‑The‑World pause freezes the JVM clock, causing the lock to expire on Redis while the original holder remains paused, leading to data races.
When building distributed systems, developers often rely on Redis locks with a TTL to avoid deadlocks caused by crashes. The common assumption is that a lock held for 10 seconds will remain exclusive for that period.
1. Reproducing the problem
The following pseudo‑code illustrates a typical lock acquisition, business execution (≈200 ms), and lock release:
// 1. Acquire lock with 10‑second TTL
if (redis.setnx("lock_key", "thread_A", 10s)) {
try {
// 2. Execute business logic (≈200 ms)
doBusiness();
} finally {
// 3. Release lock (normally with ownership check)
redis.del("lock_key");
}
}In most cases this works, but the edge case appears when the JVM experiences a long Full GC pause.
2. Time freeze during a Full GC
Consider the timeline (seconds):
(0) Thread A acquires the lock; TTL = 10 s.
(0.1) Thread A starts doBusiness() and runs only a few lines.
(0.2) A lengthy Full GC is triggered; the JVM stops all application threads, including Thread A.
(10.2) While the JVM is paused, Redis continues counting down; the key expires and is deleted.
(10.3) Thread B sees no lock key and acquires it, modifying the same data.
(12) Full GC ends; Thread A resumes, unaware that 12 seconds have passed, and continues to write to the database, causing duplicate updates.
This demonstrates the classic "time jump" problem: the lock’s logical expiration on Redis does not align with the paused JVM’s perception of time.
3. Extending the TTL is not a cure
Side effects: If the service crashes, a longer TTL (e.g., 10 minutes) blocks recovery for that entire period.
Uncontrollable: The duration of future STW pauses or network latency is unpredictable.
4. Watchdog (Redisson) renewal
Redisson implements a watchdog thread that periodically (default 1/3 of the TTL) checks whether the lock is still held and, if so, extends the TTL back to the full value. This works as long as the watchdog itself is not paused.
However, if a massive Full GC also pauses the watchdog, the lock can still expire, and another thread may acquire it, leading to the same race condition.
5. Ultimate solution: fencing token / optimistic lock
To guarantee safety for critical financial operations, combine the Redis lock with a database‑level optimistic lock (fencing token). When acquiring the lock, Redis returns an incrementing token (e.g., 33). The application must include this token in its UPDATE statement:
UPDATE account SET money = 100
WHERE id = 1 AND current_token < 33;A simpler optimistic‑lock version uses a version column:
UPDATE account SET money = 100, version = version + 1
WHERE id = 1 AND version = old_version;If another thread acquires the lock and increments the token to 34, the first thread’s UPDATE will affect zero rows, preventing stale writes.
6. Takeaway
Even with a watchdog, distributed locks cannot guarantee 100 % mutual exclusion under extreme pauses. For money‑critical services, always add a database‑level optimistic lock as a final safeguard and remember that JVM time can diverge from Redis time.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer XiaoFu
xiaofucode.com – a programmer learning guide driven by the pursuit of profit
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
