Is Redis Distributed Lock Really Safe? Uncovering Redlock, Pitfalls, and Best Practices
This comprehensive guide explains why distributed locks are needed, walks through basic Redis lock implementations, reveals common dead‑lock and expiration issues, presents atomic solutions with SET EX NX and Lua scripts, evaluates the Redlock algorithm, examines expert debates, compares Zookeeper locks, and offers practical recommendations for safe usage.
In modern micro‑service architectures multiple processes often need exclusive access to a shared resource such as a database row. A distributed lock provides this mutual exclusion across processes.
Why Distributed Locks?
When several instances of an application try to modify the same MySQL record, a single‑process lock is insufficient; an external system like Redis or Zookeeper must be used to coordinate access.
How to Implement a Distributed Lock?
The simplest approach uses Redis SETNX (set if not exists). Two clients try to set the same key; the one that succeeds holds the lock.
127.0.0.1:6379> SETNX lock 1
(integer) 1 // client 1 acquires the lock 127.0.0.1:6379> SETNX lock 1
(integer) 0 // client 2 fails to acquire the lockAfter the critical section the lock is released with DEL:
127.0.0.1:6379> DEL lock // release lock
(integer) 1How to Avoid Deadlocks?
If a client crashes or the business logic throws an exception, the lock may never be released, causing a permanent deadlock.
Lock Expiration
Adding an expiration time mitigates deadlocks. In Redis this is done by setting a TTL after acquiring the lock:
127.0.0.1:6379> SETNX lock 1 // acquire lock
(integer) 1
127.0.0.1:6379> EXPIRE lock 10 // lock expires after 10 s
(integer) 1However, SETNX and EXPIRE are two separate commands; if the second fails (network error, Redis crash, client crash) the lock may remain forever.
Atomic Set with Expiration (Redis 2.6.12+)
Redis 2.6.12 introduced the extended SET syntax that combines the operations atomically:
# one command guarantees atomicity
127.0.0.1:6379> SET lock 1 EX 10 NX
OKThis solves the basic dead‑lock problem but introduces a new risk: the lock may expire before the client finishes its work.
Ensuring the Lock Belongs to the Client
Store a unique identifier (e.g., a UUID) as the lock value. When releasing, first verify that the stored value matches the client’s identifier.
// lock value is a UUID
127.0.0.1:6379> SET lock $uuid EX 20 NX
OKAssume the 20 s expiration is sufficient for the critical section.
Release logic (pseudo‑code):
// release only if the lock is still owned
if redis.get("lock") == $uuid:
redis.del("lock")Unfortunately the GET + DEL pair is not atomic, re‑introducing the race condition.
Lua Script for Safe Release
Redis executes Lua scripts atomically, guaranteeing that the check and delete happen without interference:
// atomic check‑and‑delete
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
endWith this script the full lock workflow becomes:
Acquire: SET lock_key $unique_id EX $expire_time NX Perform the critical operation.
Release: run the Lua script above.
When Expiration Time Is Hard to Estimate
If the operation may exceed the initial TTL, a watchdog thread can periodically extend the lock before it expires. The Java library Redisson implements this “watchdog” automatically, providing re‑entrant, fair, read‑write, and Redlock implementations.
Redlock: Multi‑Node Consensus Lock
Redlock was proposed to handle master‑slave failover scenarios. It requires at least five independent Redis master instances. The client attempts to lock all nodes, succeeds if a majority (≥3) grant the lock, and validates that the total acquisition time is less than the lock TTL.
Record start timestamp T1.
Send SET lock value EX ttl NX to each Redis instance with a short network timeout.
If ≥3 instances succeed, record end timestamp T2 and verify T2‑T1 < ttl. If true, the lock is considered acquired.
Execute the critical section.
If acquisition fails, release any partial locks on all nodes using the Lua script.
Key points:
Lock on multiple instances for fault tolerance.
Majority success is required.
Total acquisition time must be within the TTL.
All nodes must be released on failure.
Debate: Is Redlock Safe?
Distributed‑systems expert Martin (Cambridge) criticized Redlock on four grounds:
Purpose : For efficiency‑only use cases a single‑node Redis lock suffices; for correctness Redlock still suffers from safety issues.
System Issues (NPC) : Network delay, process pause (GC), and clock drift can cause lock overlap.
Clock Assumptions : Redlock assumes synchronized clocks across nodes, which is unrealistic.
Missing Fencing Token : Without a monotonically increasing token, Redlock cannot guarantee that later operations are rejected.
Martin proposes a “fencing token” approach where the lock service issues an ever‑increasing number that the client includes in every write, allowing the resource to reject stale operations.
Antirez’s Rebuttal
Redis creator Antirez responded:
Clock synchronization only needs to be approximate; small drift is acceptable as long as it stays within the TTL margin.
Network delays and GC before step 3 are detected because step 3 measures the elapsed time; if it exceeds the TTL the lock is considered failed.
After step 3, any lock loss is a generic distributed‑lock problem (also affecting Zookeeper), not a Redlock‑specific flaw.
He argues that fencing tokens are unnecessary for most workloads and that Redlock’s design already provides sufficient safety when clocks are reasonably accurate.
Zookeeper‑Based Locks
Zookeeper uses temporary znodes. A client creates an EPHEMERAL node; as long as the session stays alive (heartbeat), the lock is held. If the client crashes or the session times out, the node disappears and another client can acquire the lock.
However, Zookeeper suffers from the same GC/heartbeat loss problem: a long GC pause prevents heartbeats, the session expires, the lock is released, and another client may acquire it while the original client still believes it holds the lock.
Comparative Takeaways
Both Redis‑based locks (including Redlock) and Zookeeper locks can fail under extreme conditions such as network partitions, clock drift, or long GC pauses. Redis offers higher performance, while Zookeeper provides automatic lock release without TTL management but at the cost of higher latency and operational complexity.
Personal Recommendations
For most applications:
Use a simple Redis SET EX NX lock with a unique UUID and a Lua‑based release for atomicity.
If you need higher fault tolerance, consider Redlock only when you can guarantee reasonably synchronized clocks.
For critical data integrity, complement the lock with a fencing‑token‑like mechanism at the resource layer (e.g., version column in a database).
When latency is less of a concern and you already run Zookeeper, its EPHEMERAL‑node lock can be convenient, but be aware of session‑timeout edge cases.
Conclusion
The article examined the safety of Redis distributed locks, explored dead‑lock avoidance, atomic lock acquisition, the Redlock algorithm, expert debates, and Zookeeper alternatives. It concluded that no distributed lock is 100 % safe under all failure scenarios; developers must understand the trade‑offs, use unique identifiers, atomic release scripts, and, when necessary, add higher‑level fencing to guarantee correctness.
Afterword
Understanding the limits of distributed‑lock designs teaches us to be cautious and to verify assumptions rigorously. The debate between Martin and Antirez illustrates the value of thorough scrutiny rather than simply accepting a design as "correct".
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
