Redis Distributed Locks: Safety Issues, Redlock Debate, and Best Practices
This article thoroughly examines how Redis distributed locks work, the pitfalls of simple SETNX‑based locks such as deadlocks and premature expiration, presents robust solutions using expiration, unique identifiers, Lua scripts, discusses the Redlock algorithm and its controversy, compares Zookeeper locks, and offers practical guidance for safe lock usage.
Distributed systems often need a mechanism to ensure that only one process modifies a shared resource at a time, which is why distributed locks are essential. While a single Redis instance can provide a basic lock using SETNX , this approach suffers from deadlock when a client crashes or fails to release the lock.
To avoid deadlocks, the lock must have a lease time. Before Redis 2.6.12 this required executing SETNX and EXPIRE atomically, typically via a Lua script. From Redis 2.6.12 onward a single command can guarantee atomicity: SET lock 1 EX 10 NX .
Even with expiration, two serious problems remain: the lock may expire while the client is still working, and a client may accidentally release a lock owned by another client. Both issues are solved by storing a unique identifier (e.g., a UUID) as the lock value and releasing the lock only after verifying ownership with a Lua script:
if redis.call("GET",KEYS[1]) == ARGV[1] then
return redis.call("DEL",KEYS[1])
else
return 0
endWhen Redis is deployed with replication and failover, a lock can be lost during a master‑to‑slave promotion. The Redlock algorithm addresses this by acquiring the lock on multiple independent Redis masters (at least five) and requiring a majority to succeed. The client measures the total acquisition time (T2‑T1) and aborts if it exceeds the lock’s TTL.
Redlock has sparked a heated debate. Martin, a distributed‑systems researcher, argues that Redlock’s safety relies on unrealistic clock synchronization and cannot guarantee correctness under network delays, process pauses, or clock drift. He proposes a fencing‑token approach that uses a monotonically increasing token to enforce ordering at the resource level.
Antirez, the creator of Redis, counters that modest clock drift is acceptable, that the algorithm already detects excessive acquisition latency in step 3, and that any lock service suffers from the same post‑acquisition failures (e.g., GC pauses). He also suggests that the unique lock value can serve a similar purpose to a fencing token when combined with conditional updates in the resource.
Zookeeper offers an alternative lock implementation based on ephemeral nodes and session heartbeats, eliminating explicit expiration. However, it still suffers from the same loss‑of‑lock scenario if the client’s session expires due to long GC pauses or network partitions.
In practice, distributed locks are useful for reducing contention but should not be relied upon for absolute correctness. Combining a lock for mutual exclusion with application‑level safeguards (such as fencing tokens or idempotent operations) yields the most reliable results.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.