Operations 15 min read

How to Ensure Data Consistency in High‑Concurrency Distributed Systems

This article explores the challenges of maintaining data consistency under high concurrency in distributed systems, reviewing common consistency issues, distributed lock implementations, optimistic and pessimistic strategies, CAS and ABA problems, and practical solutions such as Redis locks, Zookeeper, and transaction protocols.

Architecture & Thinking

Apr 2, 2024

How to Ensure Data Consistency in High‑Concurrency Distributed Systems

We previously introduced five distributed transaction solutions (see the referenced article) and identified three core CAP requirements—Consistency, Availability, Partition Tolerance—that often conflict in real‑world scenarios.

This article focuses on guaranteeing Data Consistency under high concurrency.

2.1 Typical Payment Scenario

In a classic payment flow, the system first queries the buyer’s account balance, calculates the order amount, and finally deducts the balance. While low concurrency poses no issue, concurrent deductions can lead to inconsistency.

2.2 Online Order Scenario

When a buyer places an order, two actions are required: inventory deduction and order‑status update. Because inventory and order data often reside in different databases, a distributed transaction is needed to keep them consistent.

2.3 Cross‑Bank Transfer Scenario

Transferring funds between banks involves debiting the sender’s account and crediting the receiver’s account on different platforms. A consistency mechanism is required to ensure both steps succeed atomically.

3.1 Distributed Lock

Common implementations include:

Database‑based lock

Cache‑based lock (Redis or similar)

Zookeeper‑based lock

Complexity rises from database (high) to Zookeeper (low), while performance and reliability vary accordingly.

For a detailed analysis, see the article “Distributed Lock Solution Analysis”.

3.1.1 Cache‑Based Distributed Lock

Redis provides a simple and high‑performance lock using atomic commands:

# Set key if not exists with automatic expiration
SET key value NX PX millisecond
# Delete a key
DEL key

The NX option sets the key only when it does not already exist (equivalent to SETNX). The PX option defines the lock’s TTL in milliseconds.

Example for a payment operation:

# Acquire lock for account 17124, expire after 500 ms
SET pay_id_17124 1 NX PX 500
# Release lock
DEL pay_id_17124

When the lock is acquired, the business logic can proceed; the lock is released after the operation or automatically expires, preventing other processes from entering.

Redis clusters can use the Redlock algorithm for stronger guarantees.

3.1.2 Advantages and Disadvantages of Cache Locks

Advantages: Redis offers superior performance compared to MySQL or Zookeeper and is easy to adopt.

Disadvantages: Relying on TTL can be unreliable; the approach behaves like a pessimistic lock and introduces an extra dependency, potentially reducing throughput.

3.2 Optimistic Mode

Optimistic locking handles probabilistic inconsistencies by using Compare‑and‑Swap (CAS). CAS updates a value only if the current value matches an expected one.

3.2.1 CAS Principle

CAS involves three operands: memory address V, expected old value A, and new value B. The update succeeds only when V equals A.

Applying CAS to the payment scenario:

Initial balance: 800

Two concurrent transactions read 800.

Transaction 1 deducts 100 → new balance 700 (succeeds if balance is still 800).

Transaction 2 deducts 200 → would result in 600, but should fail because the balance has changed to 700.

Go example (generated with Baidu Comate AI):

package main

import (
    "fmt"
    "sync/atomic"
)

// Compare returns true if the current value equals the expected one
func Compare(addr *uint32, expect uint32) bool {
    return atomic.LoadUint32(addr) == expect
}

func main() {
    var value uint32 = 0 // shared variable
    oldValue := uint32(0)

    if Compare(&value, oldValue) {
        fmt.Println("Value matches the expected old value.")
        // Perform CAS here, e.g., atomic.CompareAndSwapUint32(&value, oldValue, 1)
    } else {
        fmt.Println("Value does not match the expected old value.")
    }

    atomic.AddUint32(&value, 1)

    if Compare(&value, oldValue) {
        fmt.Println("Value still matches the expected old value, but this shouldn't happen.")
    } else {
        fmt.Println("Value no longer matches the expected old value.")
    }
}

3.3.1 What Is the ABA Problem?

In CAS, a value may change from A to B and back to A, causing a thread to mistakenly believe the value was unchanged.

3.3.2 Mitigation Strategies

Common approaches:

Attach a version number or timestamp to the shared variable and compare it together with the value.

Use language‑provided utilities (e.g., java.util.concurrent.atomic in Java) that embed versioning.

Introduce additional state flags indicating whether the variable has been modified.

Example using a versioned struct in Go:

type ValueWithVersion struct {
    Value   int32
    Version int32
}

var sharedValue atomic.Value // stores *ValueWithVersion

func updateValue(newValue, newVersion int32) bool {
    current := sharedValue.Load().(*ValueWithVersion)
    if current.Value == newValue && current.Version == newVersion {
        sharedValue.Store(&ValueWithVersion{Value: newValue, Version: newVersion + 1})
        return true
    }
    return false
}

4 Summary

Ensuring data consistency in high‑concurrency environments is complex and requires a combination of strategies:

Transactions : Use ACID‑compliant database transactions and appropriate locking.

Distributed Locks : Coordinate access to shared resources via Redis, Zookeeper, etc.

Optimistic and Pessimistic Locks : Choose based on conflict likelihood.

Consistency Protocols : Apply algorithms like Raft or Paxos for replica synchronization.

Message Queues : Preserve order and durability of asynchronous processing.

CAP and BASE Trade‑offs : Balance consistency, availability, and partition tolerance.

Cache Consistency : Implement eviction and synchronization mechanisms.

Read‑Write Separation : Distribute reads and writes across replicas.

Verification and Retries : Validate data and retry failed operations.

Monitoring and Alerts : Track latency, error rates, and consistency metrics.

In practice, the optimal solution depends on the specific business context and technology stack.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Consistency high concurrency distributed-lock CAS Optimistic Concurrency

Written by

Architecture & Thinking

🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.