Analysis of Mutual Exclusion and Idempotency Issues in Distributed Systems and Their Solutions
The article examines how distributed systems face mutual‑exclusion and idempotency challenges, explains traditional thread and process synchronization, then details distributed‑lock techniques (e.g., Zookeeper, Redis, Tair, Cerberus) and global‑ID‑based idempotent services, emphasizing the importance of external storage, fault‑tolerance, and proper lock granularity for reliable high‑throughput applications.
With the rapid development of Internet information technology, data volume continuously grows and business logic becomes increasingly complex. High concurrency access and massive data processing scenarios are becoming more common, making low‑cost solutions for high availability, scalability, and extensibility crucial. Traditional centralized systems can no longer meet these demands, and distributed systems are now widely used.
A distributed system consists of independent servers loosely coupled through a network. Each server is an autonomous host, and they communicate via internal networking. Key characteristics of distributed systems include:
Scalability: horizontal scaling can increase performance and throughput.
High reliability: fault tolerance allows the system to continue serving even if one or several nodes fail.
High concurrency: machines process and compute in parallel.
Cost‑effectiveness: multiple small machines replace a single high‑performance machine.
However, the complexity of the environment and network uncertainty introduce problems such as clock inconsistency and Byzantine failures. Traditional issues like node crashes and message loss become more intricate in a distributed context.
Two typical problems that require focused attention are:
Mutual exclusion issues.
Idempotency issues.
Below are two common examples illustrating mutual exclusion:
Example 1: Service records a critical value X = 100. Request A wants to add 200, while request B wants to subtract 100. Without proper handling, both may read X = 100 simultaneously, leading to inconsistent final values.
Example 2: Two requests randomly pick a task from a shared task pool. Without coordination, both may obtain the same task.
These scenarios demonstrate that when multiple requests access and modify the same resource without ordering guarantees, atomicity and sequence cannot be ensured.
In traditional database‑backed architectures, transactions (ACID) provide mutual exclusion. In distributed environments, distributed locks become a common and efficient solution.
Before diving into distributed locks, it is useful to review how mutual exclusion is handled in multithreaded and multiprocess contexts.
Multithreaded Environment Solutions and Principles
Most concurrency patterns serialize access to shared resources using mutexes.
Java provides two primary mutex mechanisms: Lock (e.g., ReentrantLock) and synchronized. Both protect shared variables and operations.
Implementation Principles
ReentrantLock
ReentrantLock relies on CAS (Compare‑And‑Swap) and a CLH queue. It supports fair and non‑fair modes.
final boolean nonfairTryAcquire(int acquires) {
final Thread current = Thread.currentThread();
int c = getState();
if (c == 0) {
if (compareAndSetState(0, acquires)) {
setExclusiveOwnerThread(current);
return true;
}
} else if (current == getExclusiveOwnerThread()) {
int nextc = c + acquires;
if (nextc < 0) // overflow
throw new Error("Maximum lock count exceeded");
setState(nextc);
return true;
}
return false;
}When acquiring the lock, the thread first attempts a CAS. If the state is non‑zero, it checks whether it already owns the lock (re‑entrancy) and increments the hold count.
final boolean acquireQueued(final Node node, int arg) {
boolean failed = true;
try {
boolean interrupted = false;
for (;;) {
final Node p = node.predecessor();
if (p == head && tryAcquire(arg)) {
setHead(node);
p.next = null; // help GC
failed = false;
return interrupted;
}
if (shouldParkAfterFailedAcquire(p, node) &&
parkAndCheckInterrupt())
interrupted = true;
}
} finally {
if (failed)
cancelAcquire(node);
}
}
private final boolean parkAndCheckInterrupt() {
LockSupport.park(this);
return Thread.interrupted();
}If the lock is held, the thread joins the CLH queue and may be parked until it becomes the head.
synchronized
Java implements synchronized via bytecode instructions monitorenter and monitorexit. The monitor is locked when a thread acquires ownership, and unlocked when the entry count returns to zero.
The objectref must be of type reference. Each object is associated with a monitor. A monitor is locked if and only if it has an owner. The thread that executes monitorenter attempts to gain ownership of the monitor associated with objectref, as follows: If the entry count is zero, the thread enters the monitor and sets the count to one. If the thread already owns the monitor, it re‑enters and increments the count. If another thread owns the monitor, the thread blocks until the count becomes zero.
Since JDK 1.6, the JVM has optimized monitor implementations to reduce expensive kernel‑mode mutex transitions.
Multiprocess Environment Solutions
Processes share resources such as printers or shared memory. Critical sections are protected using semaphores (P/V operations). A semaphore’s value indicates whether a process may enter the critical region.
Typical semaphore usage:
P operation: if the semaphore > 0, decrement and proceed; otherwise block.
V operation: increment the semaphore and wake a blocked process if any.
Distributed Environment Solutions – Distributed Locks
To implement a distributed lock, three basic conditions must be satisfied:
A storage space accessible by all nodes (e.g., database, Redis, Zookeeper).
A globally unique identifier for each lock.
At least two states (locked / unlocked).
Simple DB‑based lock example (pseudo‑code):
lock = mysql.get(id);
while(lock.status == 1) {
sleep(100);
}
mysql.update(lock.status = 1);
// critical section
mysql.update(lock.status = 0);Key problems of this naive approach:
Atomicity of status check cannot be guaranteed.
Network failures or host crashes may leave the lock permanently held.
Unlocking may inadvertently release a lock owned by another client.
Advanced requirements include re‑entrancy, herd‑effect mitigation, fairness, and choice between blocking and spinning locks.
Typical Implementations
Zookeeper
Zookeeper provides sequential and ephemeral nodes. By creating an EPHEMERAL_SEQUENTIAL node under a lock path, clients can determine the lock holder by the smallest sequence number. The lock is released by deleting the node, which automatically disappears if the client disconnects, solving the crash‑release problem.
Redis
Redis uses SETNX (or SET with NX) to atomically create a lock key. A timeout (TTL) is required to avoid deadlocks caused by crashes. A common pattern involves checking the lock’s timestamp and using GETSET to acquire an expired lock.
SETNX lock.orderid
if (exists) {
if (timestamp > now) { // not expired
wait and retry;
} else {
GETSET lock.orderid newTimestamp;
// if returned timestamp is still expired, lock acquired
}
}Redisson offers a robust Java client that implements distributed locks with automatic renewal and safe unlock semantics.
Tair
Tair’s expireLock combines lock state and expiration timestamp, avoiding reliance on synchronized clocks.
Cerberus Distributed Lock
Cerberus abstracts multiple engines (Tair, Zookeeper, future Redis) behind a unified API that mirrors JUC’s ReentrantLock methods ( lock(), tryLock(), unlock(), etc.). It supports engine switching for fault tolerance and provides features such as fair vs. non‑fair modes, read‑write locks, and one‑click downgrade.
Idempotency Issues
Idempotency means that multiple invocations of an interface produce the same result as a single invocation. In distributed systems, repeated calls are common due to network glitches, client retries, or message duplication.
To achieve idempotency, the GTIS (Global Transaction Idempotent Service) assigns a globally unique ID to each business operation (e.g., via MD5 of a concatenated key). The ID is stored in an external store (Tair, Redis, DB) using SETNX. If the key already exists, the operation is considered a duplicate and is rejected.
GTIS workflow:
Generate a unique transContents for the operation.
Hash it to obtain a global ID.
Attempt SETNX in Tair; success means the operation may proceed.
After execution, compare the stored value with the actual result; on success, extend the key’s TTL.
If the operation fails or times out, delete the key to allow retries.
GTIS also provides fault‑tolerance strategies and retry mechanisms to mitigate external store failures.
Conclusion
Mutual exclusion and idempotency are pervasive challenges in distributed environments. Distributed locks—drawing on concepts from multithreaded and multiprocess synchronization—provide a practical way to enforce exclusivity. Implementations based on Zookeeper, Redis, and Tair each have trade‑offs; Cerberus combines multiple engines for flexibility and resilience.
Idempotency can be achieved by preventing duplicate operations through a global unique identifier, as demonstrated by GTIS. Both solutions have strong dependencies on external storage systems, so careful design of lock granularity and failure handling is essential for high‑throughput, reliable services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
