Design and Implementation of Distributed Lock Services: Redis, ZooKeeper, and SharkLock
This article explains the principles, requirements, and implementation details of distributed lock services, comparing Redis and ZooKeeper approaches, and introduces SharkLock's design built on SharkStore with Raft-based replication, covering lock acquisition, release, reliability, scaling, and failover mechanisms.
Distributed Lock Overview
Distributed lock is a mechanism used to control exclusive access to shared resources in a distributed system, ensuring that only one thread holds the lock at a time, that the lock is re‑entrant, avoids deadlocks, provides blocking semantics, and maintains high performance and availability.
Requirements of a Distributed Lock Service
Only one thread can hold the lock at any moment.
The lock must be re‑entrant.
Deadlocks must not occur.
The lock should support blocking and be promptly wake‑up‑able.
The service must guarantee high performance and high availability.
Redis‑Based Lock Service
Lock acquisition process
SET resource_name my_random_value NX PX max-lock-timeExplanation: the SET command succeeds only when the resource does not exist, guaranteeing uniqueness of the lock holder; an expiration time prevents deadlocks; the holder value is recorded to verify ownership during unlock.
Unlock process
if redis.get("resource_name") == "my_random_value"
return redis.del("resource_name")
else
return 0The unlock logic is wrapped in a Lua script to make the check‑and‑delete operation atomic.
Problems with this scheme:
Choosing an appropriate expiration time is difficult; a short TTL may cause premature expiration, while a long TTL can lead to long‑lasting invalid waits if the holder crashes.
Redis’s asynchronous master‑slave replication may lose lock data: if the master crashes before the lock is replicated, a slave can still grant the lock, causing concurrent access.
ZooKeeper‑Based Lock Service
Lock acquisition process
Create an ephemeral sequential node under /resource_name .
List all child nodes of /resource_name and check whether the created node has the smallest sequence number. If it is the smallest, the lock is acquired; otherwise, watch the predecessor node.
ZooKeeper’s ZAB consensus protocol guarantees the safety of lock data, and the heartbeat mechanism prevents deadlocks by automatically releasing locks when a client disappears.
Unlock process
Delete the temporary node created by the client.
Issues with this approach include scenarios where a client’s heartbeat stops (e.g., network failure) and the server releases the lock, allowing another client to acquire it while the original client later recovers, leading to simultaneous access.
SharkLock Design Choices
Lock metadata includes:
lockBy : unique client identifier.
condition : client‑provided policy for server‑side handling of abnormal situations.
lockTime : timestamp of lock acquisition.
txID : globally incrementing transaction ID.
lease : lease information for automatic expiration.
Ensuring Lock Reliability
SharkLock stores lock data in SharkStore, a distributed persistent key‑value system that uses multiple replicas for safety and the Raft algorithm for strong consistency across replicas.
SharkStore Core Modules
Master Server: manages metadata, sharding, scaling, and failover scheduling.
Data Server: provides RPC access to stored KV data.
Gateway Server: handles client entry.
Sharding, Scaling, and Failover
SharkStore replicates data across multiple nodes; when a shard reaches a size threshold it splits into two. Failover operates at the range level: leaders send heartbeats, missing heartbeats trigger master‑initiated failover, and new nodes are added via Raft replication.
Balance Strategy
Masters collect range‑level and node‑level heartbeat information to rebalance replicas, adding them to low‑traffic nodes and removing from high‑traffic nodes to keep the cluster balanced.
Raft Practices – MultiRaft
Heartbeat Merging : consolidate heartbeats per dataserver, sending only range IDs to reduce overhead.
Snapshot Management : use ACKs, limit concurrent snapshots, and apply rate‑limiting to minimise impact on normal traffic. Also adopt the PreVote algorithm to avoid unnecessary leader changes during network partitions.
Raft Practices – NonVoter (Learner)
New members join as learners (non‑voting) to avoid increasing quorum size prematurely. Once a learner’s log lag falls below a configured threshold, the leader promotes it to a full voting member.
Open‑Source Availability
SharkStore is open‑source; interested readers can explore the repository at https://github.com/tiglabs/sharkstore .
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.