Why Redis Distributed Locks Fail and How to Fix Leak and Mis‑release Issues
A Redis lock timeout caused users to be blocked for two hours, revealing problems with lock leakage and mis‑release; the article explains the root causes, proper use of SETNX with EXPIRE, and Lua scripts for atomic operations and safe lock deletion.
Background
In a production environment a distributed lock implemented with Redis caused users to be unable to place orders for up to two hours because the SETNX command succeeded but the lock was not released after a timeout, leading to severe customer complaints.
Distributed lock basics
The lock acquisition uses SETNX, which returns false if the key already exists and true otherwise, guaranteeing that only one client can acquire the lock at a time. Releasing the lock is simply DEL key.
Lock‑leak problem
If a lock is created without an expiration, it may never be released. The correct way to set a timeout is with the EXPIRE command, but Redis does not provide an atomic SETNX + EXPIRE. The article recommends using a Lua script to perform both steps atomically:
if (redis.call('setnx',KEYS[1],ARGV[1]) < 1) then
return 0;
end;
redis.call('EXPIRE',KEYS[1],tonumber(ARGV[2]));
return 1;Newer Redis versions (≥2.6.12) support the combined syntax SET key value NX EX 10, which eliminates the need for a custom script.
Lock‑mis‑release problem
When a client’s lock expires, another client may acquire the same key. If the original client later calls DEL, it unintentionally removes the new client’s lock. To avoid this, the lock value should contain a unique identifier (e.g., UUID, timestamp, or IP+thread). Deletion must verify that the stored value matches the client’s identifier before deleting – a compare‑and‑delete (CAD) operation.
The required CAD logic can also be implemented with Lua:
if (redis.call('GET',KEYS[1]) ~= false) then
local v = redis.call('GET', KEYS[1]);
if (v ~= KEYS[2]) then
return -1;
end;
local res = redis.call('DEL', KEYS[1]);
if (res == 1) then
return 1;
else
return -2;
end;
end;
return 0;Load the script with redis-cli script load "$(cat cad.lua)" and invoke it via EVALSHA <sha1> 2 KEY VALUE. The return codes are:
1 – value matches, delete succeeded
0 – key does not exist
-1 – value does not match
-2 – value matches but delete failed (theoretically impossible after the prior check)
Why the lock value matters
Storing a unique value allows a client, after a timeout, to query the lock with GET. If the key exists, the client must compare the stored value with its own identifier to determine whether it actually holds the lock or if another client has taken it.
Conclusion
Both lock acquisition and release should include a unique value to prevent leak and mis‑release issues. After a timeout, a client should GET the lock and verify the value before deciding to retry or consider the lock free. Using Lua scripts (or the newer SET ... NX EX syntax) ensures atomicity and reliability of Redis‑based distributed locks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
