Design and Implementation of Distributed Lock Services: Redis, ZooKeeper, and SharkLock
The article explains the principles, requirements, and common implementations of distributed lock services using Redis and ZooKeeper, analyzes their drawbacks, and introduces SharkLock's design choices—including lock metadata, reliability via SharkStore, dead‑lock prevention, safety mechanisms, and Raft‑based consistency—to guide developers in building robust distributed locking solutions.
Distributed locks are mechanisms that ensure exclusive access to shared resources in a distributed system, using a shared identifier to guarantee uniqueness, atomic updates, and visibility to clients, while handling various failure scenarios.
Key requirements for a reliable distributed lock service include:
Only one thread can hold the lock at any time.
The lock must be re‑entrant.
No deadlocks should occur.
Support blocking semantics with timely wake‑up.
High performance and high availability.
Two widely used implementations are based on Redis and ZooKeeper.
1. Redis‑based lock service
Lock acquisition uses the SET command with NX and PX options:
SET resource_name my_random_value NX PX max-lock-timeThis succeeds only if the key does not exist, ensuring a unique owner, and sets an expiration to avoid deadlocks.
Unlock is performed atomically via a Lua script that checks the owner before deleting:
if redis.get("resource_name") == "my_random_value" then
return redis.del("resource_name")
else
return 0
endProblems with this approach:
Choosing an appropriate expiration is difficult; too short may cause premature lock release, too long may lead to long‑lasting deadlocks.
Redis asynchronous replication can lose lock data: if the master fails before replication, a slave may grant the lock to another client, causing concurrent access.
2. ZooKeeper‑based lock service
Lock acquisition creates an ephemeral sequential node under /resource_name , then checks if its sequence number is the smallest; otherwise it watches the predecessor node.
The ZAB consensus protocol guarantees data safety, and the heartbeat mechanism prevents deadlocks by releasing locks when a client disappears.
Unlock simply deletes the client’s temporary node.
Issues include potential unsafe releases when the heartbeat mechanism mistakenly frees a lock while the original client is still alive.
Design goals derived from the analysis
Ensure lock data safety.
Avoid deadlocks.
Prevent multiple owners for the same lock.
To achieve dead‑lock avoidance, two common mechanisms are used: setting an expiration time or performing server‑side liveness detection (session lease).
SharkLock design
Lock metadata includes:
lockBy: unique client identifier.
condition: client‑provided policy for abnormal situations.
lockTime: timestamp of acquisition.
txID: globally incrementing ID.
lease: lease information.
Reliability is provided by SharkStore, a distributed persistent key‑value store that uses multi‑replica Raft replication for consistency.
Dead‑lock prevention relies on periodic client heartbeats that maintain server‑side sessions; when a session lease expires, the server automatically releases the lock.
For locks without expiration, a negotiation mechanism allows clients to pass condition parameters that define when the server may release the lock (e.g., after missing a number of heartbeats).
Additional safety measures:
Clients invoke hooks on graceful shutdown to release locks.
Unreleased lock info is persisted to a file and reclaimed on restart.
Automatic lease renewal lets a client hold a lock indefinitely under normal conditions.
A version‑checking API ( checkVersion(lock_name, version) ) detects if a lock has been superseded, preventing stale owners from performing operations.
SharkStore overview
SharkStore consists of:
Master Server cluster for metadata, sharding, routing, scaling, and failover.
Data Server nodes providing RPC‑based KV access.
Gateway Server for client entry.
Data is replicated across multiple replicas; the system tolerates loss of fewer than half the replicas.
Expansion is triggered when a shard reaches a size threshold, causing it to split into two ranges; the leader monitors write byte counts and initiates split requests to the master.
Failover operates at the range level: leaders send heartbeats to replicas, and missing heartbeats cause the master to replace failed replicas and replicate data via Raft.
Balancing redistributes replicas based on node‑level and range‑level heartbeats to keep load even.
Raft practice includes:
Heartbeat aggregation to compress range IDs.
Snapshot management with ACK flow control and rate limiting.
Pre‑vote algorithm to avoid unnecessary leader changes during partitions.
Non‑voter (learner) members that receive logs without affecting quorum, later promoted when they catch up.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.