Why Your Redis Distributed Lock May Fail and How to Fix It
This article examines common failure scenarios of Redis‑based distributed locks, compares a simple lock implementation with the Redlock algorithm, and provides practical solutions for single‑point failures, lock expiration issues, clock drift, and high‑concurrency pitfalls.
Redis‑based distributed locks are widely used, but they can fail in subtle ways. The article first asks whether a distributed lock is really needed, then outlines two typical use cases: improving efficiency by avoiding duplicate work and guaranteeing correctness where duplicate execution is unacceptable.
Simple Redis Lock Implementation
A basic lock uses SET key value NX EX seconds to acquire a lock and a Lua script to release it only if the stored unique ID matches:
public static boolean tryLock(String key, String uniqueId, int seconds) {
return "OK".equals(jedis.set(key, uniqueId, "NX", "EX", seconds));
}
public static boolean releaseLock(String key, String uniqueId) {
String luaScript = "if redis.call('get', KEYS[1]) == ARGV[1] then " +
"return redis.call('del', KEYS[1]) else return 0 end";
return jedis.eval(luaScript, Collections.singletonList(key), Collections.singletonList(uniqueId)).equals(1L);
}The key points are using a unique identifier for each lock and setting an expiration to avoid permanent locks.
Limitations of the Simple Lock
Single‑point failure: If the master Redis node crashes after the lock is set but before replication, multiple clients may acquire the same lock.
Expiration race: If the task exceeds the lock’s TTL (due to GC pauses, network latency, etc.), the lock expires while the client is still working, allowing another client to proceed and causing duplicate processing.
Redlock Algorithm
Redlock mitigates the single‑point issue by requiring a majority of independent Redis masters (N > 2). The algorithm proceeds as follows:
Record the current time.
Attempt to acquire the lock on each of the N nodes, adjusting each node’s TTL by the time already spent.
If the client obtains locks on at least N/2 + 1 nodes and all remaining TTLs are positive, the lock is considered acquired; otherwise, all acquired locks are released.
Release the lock on all nodes when done.
If the adjusted TTL becomes ≤ 0 at any step, the acquisition fails.
Practical Pitfalls in High‑Concurrency Scenarios
Performance overhead: Acquiring locks sequentially on many masters adds latency; parallel requests can reduce this, but the total lock‑acquisition time must still be less than the task’s TTL.
Resource granularity: Large locked resources reduce concurrency. Splitting resources (e.g., per‑merchant or bucketed processing) can improve throughput.
Retry storms: Simultaneous retries can cause many clients to contend for the same locks. Adding random jitter to retry intervals helps mitigate this.
Node crashes: If a master fails after a client has acquired a majority, the lock may still be considered held, but subsequent failures can break safety. Adding more masters improves resilience at the cost of higher expense.
Clock drift: Redis uses the system’s realtime clock for expirations. Large clock drift or manual time changes can cause premature expiration. Using monotonic clocks would be safer, but Redis currently relies on realtime.
Renewal (Watchdog) Mechanisms
Redisson implements an automatic renewal: after a lock is acquired, a timer (default 30 s TTL, renewed every 10 s) extends the lock’s expiration as long as the client remains alive. The renewal logic runs in scheduleExpirationRenewal and repeatedly executes a Lua script that updates the TTL.
If renewal fails (e.g., due to GC pauses or network loss), multiple clients may hold the lock simultaneously. The article suggests using a fencing token (monotonically increasing per‑resource identifier) to reject stale writes, but notes drawbacks such as lack of atomicity and reduced concurrency.
Summary
The piece walks from a basic Redis lock to the more robust Redlock algorithm, highlighting real‑world pitfalls—single‑point failures, lock expiration races, clock drift, and high‑concurrency contention—and offers concrete mitigation strategies like multi‑master quorum, lock granularity, retry jitter, watchdog renewal, and optional fencing tokens.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
