Backend Development 36 min read

Understanding the Safety of Redis Distributed Locks and the Redlock Debate

Redis distributed locks require unique identifiers, atomic Lua releases, and TTL refreshes to avoid deadlocks, while the Redlock algorithm adds majority quorum but remains vulnerable to clock drift and client pauses, so critical systems should combine it with fencing tokens or version checks for true safety.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Understanding the Safety of Redis Distributed Locks and the Redlock Debate

In high‑concurrency scenarios, multiple processes on different machines need to coordinate access to a shared resource. A common solution is to use a distributed lock, and Redis is often chosen for its simplicity and performance.

The most basic Redis lock uses the SETNX command to set a key only if it does not already exist. If SETNX lock 1 returns 1 , the client has acquired the lock; a second client will receive 0 and fail to lock.

However, this naive approach suffers from several problems:

Deadlock: if the client crashes or the lock is not released, the key remains set forever.

Premature expiration: a client may hold the lock longer than the TTL, causing the lock to expire while the operation is still running.

Unlocking the wrong lock: a client may delete a lock that it does not own.

To mitigate these issues, a robust lock implementation should:

Set a lock with a unique identifier (e.g., a UUID) using SET lock $uuid EX 10 NX .

When releasing, verify ownership atomically with a Lua script: if redis.call("GET", KEYS[1]) == ARGV[1] then return redis.call("DEL", KEYS[1]) else return 0 end

Refresh the lock’s TTL periodically (a “watchdog” thread) to avoid expiration while the client is still working.

When a Redis cluster with master‑replica replication is used, a master failover can cause the lock to disappear on the new master, breaking safety guarantees. The Redlock algorithm was proposed to address this by acquiring the lock on a majority of independent Redis instances.

Redlock’s workflow consists of five steps:

Record the current timestamp T1.

Attempt to acquire the lock on N (≥5) Redis nodes with SET key value EX ttl NX , each with a short network timeout.

If the lock is obtained on at least a majority (e.g., 3 of 5) and the total elapsed time (T2‑T1) is less than the TTL, consider the lock acquired.

Perform the protected operation.

If acquisition fails, release any partial locks on all nodes using the same Lua script.

Two experts have debated Redlock’s correctness:

Martin Kleppmann’s Critique

Purpose: He distinguishes between efficiency (acceptable occasional lock loss) and correctness (must never allow concurrent updates). He argues Redlock is over‑engineered for efficiency and unsafe for correctness.

NPC (Network delay, Process pause, Clock drift) problems: He shows scenarios where a client pauses (e.g., GC) long enough for all locks to expire, after which another client acquires the lock, leading to two clients believing they hold the lock simultaneously.

Clock assumptions: Redlock assumes loosely synchronized clocks; any large clock jump invalidates safety.

Solution: He proposes a “fencing token” – a monotonically increasing identifier stored with the lock, which the resource (e.g., a database) checks before applying updates, guaranteeing that only the latest token can modify the data.

Antirez’s (Redis author) Response

He argues that clocks only need to be roughly synchronized and that proper operational practices (small NTP adjustments, avoiding manual clock changes) keep drift within acceptable bounds.

He emphasizes step 3 of Redlock: if the total acquisition time exceeds the TTL, the client aborts and releases all locks, preventing the GC‑pause scenario Martin describes.

He notes that any lock system (including ZooKeeper or Etcd) suffers from the same “client‑pause‑while‑holding‑lock” issue; the problem is not unique to Redlock.

Regarding fencing tokens, he points out that many resources already provide versioning (e.g., MySQL’s row version) and that adding a separate token would require the resource to support such checks, which is not always feasible.

Other distributed lock implementations (ZooKeeper, Etcd) also rely on session heartbeats or lease renewal. If a client pauses and fails to send heartbeats, the lock is automatically released, and another client may acquire it, leading to the same double‑ownership problem.

**Key takeaways**

Distributed locks are never 100 % safe; they can fail under network partitions, process pauses, or clock drift.

When correctness is critical, combine a lock with a resource‑level fencing token or version check (optimistic concurrency control).

Redlock can be useful for high‑throughput scenarios where occasional lock loss is acceptable, but it requires careful clock management and majority quorum.

ZooKeeper and Etcd provide “watch” mechanisms for fairness but share the same heartbeat‑loss vulnerability.

**Practical recommendation**

Use a simple Redis lock with a unique identifier and Lua‑based atomic release for most cases.

If the workload demands strong correctness, add a fencing token (or rely on the database’s row version) to guard the critical section.

Ensure your infrastructure maintains reasonably synchronized clocks (e.g., NTP with small steps) and avoid manual clock changes.

concurrencyRedisZookeeperDistributed LockETCDRedlock
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.