Fundamentals 34 min read

Understanding Mutual Exclusion and Idempotency in Distributed Systems: Locks, Implementations, and GTIS

This article explains the challenges of mutual exclusion and idempotency in distributed environments, compares thread‑level and process‑level solutions, describes the principles and typical implementations of distributed locks (Zookeeper, Redis, Tair, Cerberus), and introduces GTIS as a reliable idempotency framework.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Understanding Mutual Exclusion and Idempotency in Distributed Systems: Locks, Implementations, and GTIS

With the rapid growth of internet data and increasingly complex business logic, systems must achieve high concurrency, scalability, and availability at low cost.

Distributed systems consist of loosely coupled independent servers and exhibit characteristics such as scalability, high reliability, high concurrency, and cost‑effectiveness. However, they also introduce complexities like clock inconsistency and Byzantine failures, making mutual exclusion and idempotency critical problems.

Mutual Exclusion Problem

Two typical examples illustrate the issue:

Example 1: Service records a key X=100. Request A adds 200, request B subtracts 100. Without proper handling, both may read X=100 and write conflicting values.

Example 2: Two requests randomly pick a task from a shared pool; without coordination they may select the same task.

These scenarios demonstrate that concurrent accesses to shared resources must be serialized to guarantee atomicity and ordering.

In traditional JVM‑based architectures, database transactions (ACID) provide this guarantee. In distributed environments, a distributed lock is a common and efficient solution.

Thread‑Level Solutions

Java provides two built‑in mutual‑exclusion mechanisms:

Lock (e.g., ReentrantLock )

synchronized (statement or method)

ReentrantLock uses CAS + CLH queue and supports fair and non‑fair modes. Its core acquisition method can be summarized as:

final boolean nonfairTryAcquire(int acquires) {
    Thread current = Thread.currentThread();
    int c = getState();
    if (c == 0) {
        if (compareAndSetState(0, acquires)) {
            setExclusiveOwnerThread(current);
            return true;
        }
    } else if (current == getExclusiveOwnerThread()) {
        int nextc = c + acquires;
        if (nextc < 0) // overflow
            throw new Error("Maximum lock count exceeded");
        setState(nextc);
        return true;
    }
    return false;
}

If acquisition fails, the thread joins the CLH queue and may block via LockSupport.park() :

final boolean acquireQueued(final Node node, int arg) {
    boolean failed = true;
    try {
        boolean interrupted = false;
        for (;;) {
            Node p = node.predecessor();
            if (p == head && tryAcquire(arg)) {
                setHead(node);
                p.next = null; // help GC
                failed = false;
                return interrupted;
            }
            if (shouldParkAfterFailedAcquire(p, node) && parkAndCheckInterrupt())
                interrupted = true;
        }
    } finally {
        if (failed)
            cancelAcquire(node);
    }
}

The synchronized construct inserts monitorenter and monitorexit bytecode instructions, which acquire and release a monitor associated with the target object.

Process‑Level Solutions

In multi‑process systems, critical resources are protected by semaphores (named or unnamed). A semaphore’s P (wait) operation decrements the count if positive; otherwise the process blocks. V (signal) increments the count and wakes a blocked process.

Distributed Lock Fundamentals

A distributed lock must satisfy three basic conditions:

There must be a globally accessible storage space for the lock state.

The lock must have a unique identifier.

The lock must have at least two states (locked/unlocked).

Typical external storage includes databases, Redis, Tair, or coordination services such as Zookeeper.

Typical Implementations

Zookeeper uses sequential ephemeral nodes. Clients create a node under /lock/ ; the client with the smallest sequence number holds the lock. Others watch the predecessor node and retry when it disappears.

Redis relies on atomic commands like SETNX and GETSET . A lock key is set with an expiration to avoid deadlocks caused by crashes.

Sample pseudo‑code for a simple DB‑based lock:

lock = mysql.get(id);
while (lock.status == 1) {
    sleep(100);
}
mysql.update(lock.status = 1);
// critical section
mysql.update(lock.status = 0);

These basic approaches suffer from three problems:

Atomicity of lock‑state check cannot be guaranteed.

Network partition or host crash may leave the lock permanently held.

Unlocking may be performed by a different client than the one that acquired the lock.

Advanced requirements include re‑entrancy, herd effect mitigation, fairness, and choice between blocking and spinning locks.

Cerberus Distributed Lock

Cerberus abstracts multiple engines (Zookeeper, Tair, future Redis) behind a unified interface, offering features such as:

One‑interface‑multiple‑engines support.

Easy learning curve; API mirrors java.util.concurrent.locks.Lock (lock, tryLock, unlock, etc.).

One‑click engine downgrade with switchEngine() methods.

Built‑in public cluster and authorization mechanisms.

It has been iterated over eight versions and runs stably in many Meituan‑Dianping projects.

Idempotency Problem and GTIS

Idempotency ensures that repeated calls to an interface produce the same result as a single call. In distributed systems, duplicate requests are common due to network glitches or user retries.

GTIS (Global Transaction Idempotent Service) provides a lightweight solution by generating a global unique ID for each business operation (e.g., MD5 of a concatenated key). The ID is stored in an external engine (typically Tair) using SETNX to guarantee atomicity.

Workflow:

Business generates a unique transContents and passes it to GTIS.

GTIS creates a global ID (MD5) and attempts SETNX with value timestamp+transContents .

If successful, the operation proceeds; otherwise it is rejected as duplicate.

After the operation finishes, GTIS verifies the result and, on success, extends the key’s expiration.

GTIS also handles failure scenarios such as operation timeout, host crash, and ensures that only the owner can release the lock by comparing timestamps.

Summary

Mutual exclusion and idempotency are pervasive challenges in distributed environments. Distributed locks—implemented via Zookeeper, Redis, or multi‑engine solutions like Cerberus—provide a reliable way to serialize access to shared resources. GTIS offers a complementary idempotency mechanism by preventing duplicate operations through globally unique identifiers stored in external engines. Both techniques rely heavily on external storage reliability and must be chosen carefully based on system architecture and performance requirements.

JavaConcurrencyRedisZookeeperdistributed lockidempotencymutual exclusion
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.