Redlock vs Antirez: A Deep Dive into Distributed Lock Safety and Fencing Tokens
This article revisits the Redlock debate by examining Antirez's rebuttal to Martin Kleppmann, analyzing Hacker News discussions, comparing Redis, ZooKeeper, and Chubby lock implementations, and exploring timing assumptions, clock skew, and fencing token mechanisms to assess distributed lock safety.
Antirez’s Rebuttal
After Martin Kleppmann published his "How to do distributed locking" blog, Antirez quickly responded with a detailed article titled "Is Redlock safe?" (http://antirez.com/news/101). He challenges two main criticisms raised by Martin:
Distributed locks with automatic expiration must provide a fencing mechanism to guarantee true mutual exclusion; Redlock lacks such a mechanism.
Redlock relies on strong timing assumptions that are hard to satisfy in real systems.
Antirez questions the need for a fencing token if a lock can already guarantee exclusive access, and argues that a random token generated by Redlock can serve as a unique identifier (a "unique token") for a "Check and Set" operation.
When starting to work with a shared resource, we set its state to "<token>", then we operate the read‑modify‑write only if the token is still the same when we write.He further claims that the most dangerous timing issue is large clock jumps, which can be avoided with proper operations, while long GC pauses or network delays are already mitigated by Redlock’s design.
Hacker News Discussions
Both Martin’s and Antirez’s blogs sparked lively threads on Hacker News (https://news.ycombinator.com/item?id=11059738 and https://news.ycombinator.com/item?id=11065933). Antirez actively participated, defending the view that message delays between the client and lock server can be tolerated, while Martin emphasized that delays between the client and the protected resource remain problematic.
Key exchange excerpts:
antirez: @martinkl so I wonder if after my reply, we can at least agree about unbound messages delay to don’t cause any harm. Martin: @antirez Agree about message delay between app and lock server. Delay between app and resource being accessed is still problematic.
ZooKeeper Distributed Locks
Many practitioners consider ZooKeeper a safer alternative to Redis for distributed locking. A typical ZooKeeper lock works as follows:
Client creates an ephemeral znode (e.g., /lock). The first creator acquires the lock.
After finishing work, the client deletes the znode, allowing others to acquire the lock.
The ephemeral nature ensures automatic lock release if the client crashes.
However, ZooKeeper also depends on session heartbeats; a missed heartbeat leads to automatic znode deletion, which can cause the same split‑brain scenario as Redlock when a client’s pause overlaps with lock release.
Chubby’s Approach to Fencing
Google’s Chubby service, described in the paper "The Chubby lock service for loosely‑coupled distributed systems" (https://research.google.com/archive/chubby.html), introduces a sequencer consisting of the lock name, lock mode, and a monotonically increasing 64‑bit generation number. Clients can request a sequencer and attach it to resource operations. The resource server validates the sequencer either via Chubby’s CheckSequencer() API or by comparing it to the latest known sequencer.
Chubby also offers a lock‑delay feature: when a client’s session disappears, the lock remains unavailable for a configurable period, giving the previous holder time to drain pending requests.
Clock Assumptions and Skew
Martin argues that system clocks inevitably experience jumps, making Redlock’s timing assumptions unrealistic. Antirez counters that with proper NTP configuration and avoiding manual clock changes, large jumps can be prevented. Real‑world evidence (Julia Evans’s "TIL: clock skew exists" – http://jvns.ca/blog/2016/02/09/til-clock-skew-exists/) confirms that clock skew is a genuine concern.
Martin’s Post‑Discussion Summary
Martin later compiled a comprehensive story of the debate (https://storify.com/martinkl/redlock-discussion). He emphasizes that the goal is to learn from each other’s work rather than to win arguments, and that understanding the trade‑offs of different distributed‑lock designs is essential for building reliable systems.
Overall, the discussion highlights that:
Both Redis‑based Redlock and ZooKeeper/Chubby locks have strengths and weaknesses.
Fencing tokens (or equivalent monotonic identifiers) are crucial for preventing stale operations.
Clock reliability and network delays remain the primary sources of lock safety concerns.
Choosing a lock implementation therefore depends on the specific correctness versus efficiency requirements of the application.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
