Understanding Distributed System Consistency, CAP, ACID, and Transaction Protocols (2PC & 3PC)
This article explains the challenges of consistency in distributed systems, introduces the CAP theorem and ACID properties, describes common distributed transaction techniques such as local message tables, transactional message middleware like RocketMQ, and details the two‑phase and three‑phase commit protocols with their advantages and drawbacks.
In modern internet architectures, distributed systems and micro‑service designs are prevalent, making a single operation often involve multiple services and database instances. Achieving strong consistency across these independent operations is especially difficult in high‑consistency scenarios.
Because of horizontal scaling and cost considerations, traditional strong‑consistency solutions (e.g., single‑node transactions) are often abandoned in favor of the CAP theorem, which forces a trade‑off between consistency, availability, and partition tolerance, typically resulting in eventual consistency.
Characteristics of Distributed Systems
In a distributed system, it is impossible to simultaneously satisfy all three CAP properties. Most scenarios sacrifice strong consistency for higher availability, relying on eventual consistency.
CAP Understanding
Consistency: All nodes see the same data at the same time.
Availability: Reads and writes always succeed.
Partition Tolerance: The system continues to operate despite network partitions or node failures.
ACID Understanding
Atomicity: All operations in a transaction succeed or none do.
Consistency: Transaction boundaries do not violate database integrity.
Isolation: Concurrent transactions do not interfere with each other.
Durability: Once committed, changes survive failures.
Basic Introduction to Distributed Transactions
The Distributed Transaction Service (DTS) ensures eventual consistency in large‑scale distributed environments. According to CAP, only two of the three properties can be fully achieved, so systems usually trade strong consistency for availability and rely on idempotent operations to guarantee final consistency.
Understanding Data Consistency
Strong consistency guarantees that every read sees the latest write, but it sacrifices availability. Weak consistency does not guarantee immediate visibility of writes. Eventual consistency, a form of weak consistency, ensures that in the absence of new updates the system will converge to the latest value; DNS is a classic example.
Common Distributed Techniques
1. Local Message Table : Inspired by eBay, this approach splits a distributed transaction into a series of local transactions. Example: a cross‑bank transfer where the debit operation inserts a message into a local table, followed by notifying the counterpart bank via high‑throughput MQ or periodic polling.
2. Message Middleware (non‑transactional): Using MQ to deliver messages after a successful DB operation may still fail, leading to consistency issues.
Possible outcomes:
Both DB update and MQ delivery succeed.
DB update fails, so no MQ message is sent.
DB update succeeds but MQ delivery fails, causing a rollback.
Consumer‑side challenges include ensuring that business logic succeeds after message dequeue and avoiding duplicate consumption.
Transactional Message Middleware
Alibaba's RocketMQ provides a transactional message mechanism that guarantees local operations and message sending behave as a single transaction.
Phase 1: RocketMQ sends a Prepared message and holds its address.
Phase 2: Execute the local transaction.
Phase 3: Confirm the message; if the local transaction succeeded, the message is committed, otherwise it is rolled back.
If the confirmation fails, RocketMQ periodically scans prepared messages and decides to commit or roll back based on the sender’s strategy, ensuring atomicity between message sending and the local transaction.
Most open‑source MQs (ActiveMQ, RabbitMQ, Kafka) lack built‑in transactional support; RocketMQ’s implementation is proprietary.
Understanding 2PC and 3PC Protocols
To solve distributed consistency, two‑phase commit (2PC) and three‑phase commit (3PC) are widely used.
2PC
2PC introduces a coordinator to collect votes from participants and then decides to commit or abort. It consists of:
Phase 1 – Voting (prepare): Coordinator asks participants to prepare; participants execute locally, log, and wait.
Phase 2 – Commit/Abort: If all votes are positive, coordinator sends commit; otherwise, it sends rollback. Participants act accordingly and release resources.
Drawbacks of 2PC include a single point of failure (coordinator), blocking participants during the commit phase, and potential data inconsistency if some participants miss the final commit message.
3PC
3PC adds a pre‑commit phase and timeout handling to reduce blocking:
Phase 1 – can_commit: Coordinator asks participants if they can commit; participants respond with a lightweight estimate.
Phase 2 – pre_commit: If all agree, coordinator sends a pre‑commit; participants execute but do not commit, then acknowledge.
Phase 3 – do_commit: Based on responses, coordinator sends final commit or abort. Participants act and release resources.
3PC mitigates some 2PC drawbacks by allowing participants to timeout and proceed with commit, but it still cannot fully eliminate inconsistency under network failures.
Reference
https://zhuanlan.zhihu.com/p/25933039
http://www.infoq.com/cn/articles/solution-of-distributed-system-transaction-consistency
http://blog.csdn.net/jasonsungblog/article/details/49017955
http://blog.csdn.net/suifeng3051/article/details/52691210
https://my.oschina.net/wangzhenchao/blog/736909
Original article by Zhang Songran, linkedkeeper.com.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.