Databases 20 min read

Why Distributed Data Consistency Is Hard and How to Solve It

This article explains why achieving data consistency in modern distributed systems is challenging, reviews ACID properties, CAP and BASE theorems, event ordering, and compares practical solutions such as two‑phase commit, Paxos, local message tables, and cache concurrency strategies.

JD Cloud Developers

Apr 13, 2023

Why Distributed Data Consistency Is Hard and How to Solve It

1. Introduction

Ensuring data consistency is a fundamental problem in large‑scale distributed software systems. Traditional local database transactions rely on ACID properties (Atomicity, Consistency, Isolation, Durability) as illustrated by MySQL InnoDB.

Hardware tests show that sequential disk I/O is orders of magnitude faster than random I/O, which explains why databases use various log files and buffer pools to turn random writes into sequential ones.

2. Distributed Systems

2.1 CAP Theorem

The CAP theorem (Consistency, Availability, Partition tolerance) states that a distributed system can satisfy at most two of these three properties. It leads to three system categories: CA (strong consistency, high availability, limited partition tolerance), CP (strong consistency, partition tolerance, limited availability), and AP (high availability, partition tolerance, limited consistency).

2.2 BASE Theorem

BASE (Basically Available, Soft state, Eventually consistent) relaxes strict consistency in favor of availability and partition tolerance, guiding the design of large‑scale internet services.

2.3 Event Ordering

Distributed systems need to determine the order of events across machines. Logical clocks (Lamport) provide causal ordering, while vector clocks extend this to full ordering. Hybrid logical clocks and Google’s TrueTime combine logical and physical time to reduce communication overhead.

3. Common Solutions

3.1 Two‑Phase Commit (2PC) and Three‑Phase Commit (3PC)

XA defines a two‑phase commit protocol used by many databases and middleware. 3PC adds a pre‑commit phase to mitigate coordinator failures and network timeouts, but both increase latency and complexity.

3.2 Local Message Table

This pattern stores pending messages in a local table within the same transaction as business data, then a background task reliably delivers the messages to a message queue, achieving eventual consistency.

3.3 MQ‑Based Approaches

Directly sending MQ messages outside the local transaction reduces complexity but sacrifices determinism; reliable message services can provide atomicity and durability but not full ACID guarantees.

3.4 Transactional Messages

Some MQs (e.g., RocketMQ) support transactional messages with a prepare‑execute‑confirm flow, ensuring that message delivery and local transaction succeed or fail together.

4. Concurrency Control

Cache layers improve response time but introduce consistency pitfalls. Two common patterns are:

Invalidate the cache after a database write and reload on cache miss (simple but may cause cache‑penetration).

Read‑write separation using change events or binlog subscription to asynchronously update the cache (CQRS style, higher read performance).

5. Conclusion

There is no universal solution for distributed data consistency; the choice depends on business requirements for consistency, availability, and performance. Financial systems often need strong consistency via protocols like Paxos, while e‑commerce can adopt flexible, eventually‑consistent designs such as reliable message‑driven transactions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CAP theorem Data Consistency Databases BASE theorem transaction protocols

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.