Redlock Debated: Deep Dive into Distributed Locks, Pitfalls, and Better Choices
This article examines the evolution of distributed locking—from simple MySQL table locks to Redis cache locks and the Redlock algorithm—highlighting their limitations, expert criticisms, and why Zookeeper often provides a more reliable solution for high‑availability systems.
Origin
Recently the Redis author published “Is Redlock safe?” in response to a distributed‑systems expert’s critique titled “How to do distributed locking.” The two articles sparked a lively debate, which this piece analyzes.
Database Table Locks
The author’s first experience with distributed locks used a MySQL table named lockedOrder:
CREATE TABLE `lockedOrder` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'primary key',
`type` tinyint(8) unsigned NOT NULL DEFAULT '0' COMMENT 'operation type',
`order_id` varchar(64) NOT NULL DEFAULT '' COMMENT 'locked order id',
`memo` varchar(1024) NOT NULL DEFAULT '',
`update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'record time',
PRIMARY KEY (`id`),
UNIQUE KEY `uidx_order_id` (`order_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='orders under lock';Locking relies on the UNIQUE KEY of order_id. Simple lock/unlock pseudocode:
def lock:
exec sql: insert into lockedOrder(type,order_id,memo) values (type,order_id,memo)
if result == true:
return true
else:
return false
def unlock:
exec sql: delete from lockedOrder where order_id='order_id'Issues identified:
Implements only a non‑blocking tryLock; a true blocking lock would require repeated inserts.
No expiration, so a crashed service can leave a stale lock; a cleanup job must delete rows older than a timeout.
Not re‑entrant; the same client cannot reacquire the lock without additional logic such as storing a client identifier and a lock counter.
Cache Locks
Redis provides a high‑performance lock using the SETNX command, which succeeds only when the key does not exist. Since Redis 2.6.12, SET supports atomic NX and EX options, combining lock acquisition and expiration in a single command.
Advantages: low latency and high throughput. Drawbacks: if the Redis instance crashes, the lock disappears; asynchronous replication can still cause brief windows of inconsistency.
Distributed Cache Lock – Redlock
To mitigate single‑node failure, the Redis author proposed the Redlock algorithm, which assumes N independent Redis nodes (commonly N=5). The steps are:
Client records the current time in milliseconds.
Client attempts to acquire the same key/value on all N nodes, setting a short network timeout (e.g., 5‑50 ms) far smaller than the lock’s TTL (e.g., 10 s).
Client measures the elapsed time; it must acquire locks on at least three nodes and the elapsed time must be less than the TTL.
The effective lock TTL is the original TTL minus the elapsed time.
If acquisition fails, the client releases any partial locks.
Redlock can tolerate up to two node failures, offering higher availability than a single Redis lock.
Expert Criticism of Redlock
The expert argued that Redlock fails to guarantee correctness because:
Long GC pauses (e.g., Java Full GC) can cause a client to lose its lock while still processing, allowing another client to acquire it.
Redlock relies on local clocks; clock skew can let two clients think they hold the lock simultaneously.
Illustrative diagrams (omitted) showed token‑based MVCC as a possible mitigation and highlighted scenarios where clock inaccuracies lead to duplicate lock acquisition.
Redis Author’s Rebuttal
I asked for an analysis in the original Redlock specification here: http://redis.io/topics/distlock . So thank you Martin. However I don’t agree with the analysis.
The author presented five counter‑points:
Distributed locks are used only when no better coordination mechanism exists; token‑based solutions may replace the need for a lock.
Generating reliable tokens still requires a coordination service.
Instead of sequential tokens, a UUID can serve as a unique lock identifier.
Ordered tokens do not solve the GC‑induced timeout problem.
Most use‑cases involve non‑transactional updates where a lock remains the simplest tool.
The author also clarified that the effective lock time is the TTL minus the acquisition latency, preventing the clock‑skew scenario described by the expert.
Further Analysis
While Redlock improves reliability, it incurs significant costs: deploying at least five nodes, extra network round‑trips, possible lock contention when only three of five nodes respond, and degraded performance during node failures or network partitions. A leader‑based approach (e.g., using a consensus service) could avoid many of these issues.
Better Distributed Locks – Zookeeper
Zookeeper implements a Paxos‑like consensus protocol. Write requests go to the leader, which replicates to followers before acknowledging success, providing strong consistency.
Key features for locking:
Watcher mechanism enables true blocking locks by notifying clients when a lock node is deleted.
Ephemeral nodes automatically disappear if the client session ends, eliminating the need for explicit TTLs.
Typical lock acquisition involves creating a znode such as /lock; the first client succeeds, others set a watch and wait for deletion.
Java developers can use the Curator library ( org.apache.curator.framework.recipes.locks) to simplify Zookeeper lock usage, as described in “Learning Zookeeper by Example: Distributed Locks”.
Conclusion
The article reviewed several distributed‑lock implementations—database locks, Redis cache locks, Redlock, and Zookeeper—highlighting their trade‑offs. For scenarios demanding the highest reliability, Zookeeper (or similar consensus services) is recommended over Redlock, while Redis remains a fast option for less stringent correctness requirements.
References:
Distributed locks with Redis
Is Redlock safe?
How to do distributed locking
Learning Zookeeper by Example: Distributed Locks
From Paxos to Zookeeper: Distributed Consistency Principles and Practice
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
