Backend Development 14 min read

Design and Implementation of Distributed Lock Services: Redis, ZooKeeper, and SharkLock

This article explains the principles of distributed locks, compares Redis‑based and ZooKeeper‑based implementations, discusses their limitations, and introduces the SharkLock system built on SharkStore with Raft‑based replication, detailing its reliability, dead‑lock prevention, and safety mechanisms.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Design and Implementation of Distributed Lock Services: Redis, ZooKeeper, and SharkLock

Distributed locks are mechanisms that ensure mutually exclusive access to shared resources in a distributed system, using a shared identifier to guarantee uniqueness, atomic modifications, and visibility to lock clients, while handling various failure scenarios.

The lock service must satisfy several requirements: only one thread holds the lock at a time, the lock is re‑entrant, deadlocks are avoided, blocking locks can be awakened promptly, and the service remains high‑performance and highly available.

1. Redis‑based lock service

Lock acquisition

SET resource_name my_random_value NX PX max-lock-time

The command succeeds only if the resource does not exist, guaranteeing a unique owner; the expiration prevents deadlocks, and the owner value is recorded for safe release.

Lock release

if redis.get("resource_name") == "my_random_value" then return redis.del("resource_name") else return 0 end

A Lua script ensures that checking the owner and deleting the key happen atomically.

Problems of the Redis approach include difficulty choosing an appropriate expiration time (too short may cause premature release, too long may cause long‑lasting deadlocks) and the risk of data loss due to asynchronous master‑slave replication, which can allow multiple clients to acquire the lock simultaneously after a master failure.

2. ZooKeeper‑based lock service

Lock acquisition

Create an ephemeral sequential node under /resource_name .

List all children of /resource_name ; if the created node has the smallest sequence number, the lock is granted, otherwise watch the predecessor node.

The ZAB consensus protocol guarantees safety of lock data, and the heartbeat mechanism prevents deadlocks by releasing locks when a client disappears.

Lock release

Delete the client’s ephemeral node.

Issues with ZooKeeper include scenarios where a client’s heartbeat stops, the server releases the lock, and the client later recovers and mistakenly believes it still holds the lock, leading to concurrent access.

Design goals for an ideal lock

Lock data safety.

No deadlocks.

Only one holder at any time.

To achieve these, mechanisms such as lock expiration or server‑side health checks are employed, but each can compromise safety under certain failure conditions.

SharkLock design

Lock metadata includes lockBy (client ID), condition (client‑specified behavior), lockTime , txID (global incrementing ID), and lease (lease period).

Reliability is provided by SharkStore, a distributed persistent key‑value store that uses multi‑replication and Raft for strong consistency.

Deadlock prevention

Clients send periodic heartbeats; the server maintains sessions and extends leases.

If a lock holder fails, the server deletes its lock after the lease expires.

For locks without expiration, a negotiation condition allows the server to release the lock after a configurable number of missed heartbeats.

Processes attempt graceful release via shutdown hooks or persisted lock files on restart.

Safety mechanisms

Enforce that the same client that acquires a lock must release it.

Automatic lease renewal lets a client hold a lock indefinitely under normal operation, reducing the chance of premature expiration.

Version‑checking API checkVersion(lock_name, version) validates that a lock holder’s version is still current before allowing resource access, preventing stale holders from acting after a failover.

SharkStore architecture

It consists of Master Server (metadata, routing, scaling, failover), Data Server (KV storage with RPC), and Gateway Server (client entry point). Data is sharded with multiple replicas; a shard splits when it reaches a size threshold, and failover occurs at the range level using Raft replication.

Additional Raft practices include MultiRaft (heartbeat aggregation, snapshot throttling, PreVote) and NonVoter (learners that receive logs without affecting quorum, later promoted when sufficiently caught up).

backendRedisZookeeperDistributed LockRaftSharkLockSharkStore
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.