Mastering Distributed Transactions: From Two-Phase Commit to Saga and Beyond
This article explains the fundamentals of distributed transactions, compares classic solutions such as XA, Saga, TCC, local message tables, and transactional messages, discusses their trade‑offs, and introduces advanced techniques like sub‑transaction barriers to handle network anomalies in microservice architectures.
Background and Motivation
As business complexity grows, many systems evolve from monoliths to distributed microservice architectures, inevitably facing the challenge of maintaining data consistency across services. The article uses a simple money‑transfer example (debit A, credit B) to illustrate the need for atomic, consistent, isolated, and durable operations in a distributed setting.
Basic Theory
A database transaction guarantees that a group of statements either all succeed or all fail, adhering to the ACID properties:
Atomicity : All operations succeed or none do.
Consistency : Integrity constraints remain intact before and after the transaction.
Isolation : Concurrent transactions do not interfere with each other.
Durability : Once committed, changes survive failures.
Distributed Transaction Fundamentals
In a distributed scenario, the transaction coordinator, resource managers, and the application reside on different nodes. The article notes that distributed transactions often relax isolation and consistency to meet availability and performance requirements, following BASE principles while still preserving core ACID aspects where possible.
Classic Solutions
Two‑Phase Commit (XA)
XA defines a protocol between a global transaction manager (TM) and local resource managers (RM). It consists of:
Prepare : All RMs lock resources and report readiness.
Commit/Rollback : TM instructs RMs to finalize or abort.
Most major databases (MySQL, Oracle, SQL Server, PostgreSQL) support XA. Drawbacks include long‑lasting locks and low concurrency.
Saga
Saga breaks a long transaction into a series of local short transactions coordinated by a saga orchestrator. If any step fails, compensating actions are executed in reverse order. It offers higher concurrency but requires explicit compensation logic and provides weaker consistency.
Try‑Confirm‑Cancel (TCC)
TCC splits work into three phases:
Try : Perform checks and reserve resources.
Confirm : Execute the business operation using reserved resources (must be idempotent).
Cancel : Release reserved resources if the transaction aborts.
TCC achieves strong consistency without the long locks of XA, at the cost of higher development effort.
Local Message Table
Originally proposed by eBay, this pattern stores pending business tasks in a local message table within the same transaction, ensuring atomicity between business updates and message creation. A background worker later processes the messages.
Transactional Message (RocketMQ)
RocketMQ’s transactional message replaces the local table with a half‑message stored on the broker. The broker invokes the local transaction, then commits or rolls back the message based on the transaction outcome.
Maximum‑Effort Notification
This approach repeatedly attempts to notify the receiver, using exponential back‑off intervals, and provides a query interface for the receiver to pull the final result if notifications fail.
AT Mode (Seata)
Seata’s AT mode resembles XA but automates compensation, reducing developer effort while still suffering from long‑duration locks.
Network Anomalies in Distributed Transactions
Three failure categories are highlighted:
Empty Rollback : Cancel is called without a preceding Try.
Idempotency : Retries due to network glitches must not cause duplicate effects.
Hang : Cancel executes before Try, leading to inconsistent state.
These issues are illustrated with sequence diagrams.
Sub‑Transaction Barrier Technique
DTM introduces a barrier table sub_trans_barrier keyed by global‑transaction‑id, branch‑id, and branch‑type (try|confirm|cancel). The ThroughBarrierCall function ensures:
Empty compensation is prevented.
Idempotent execution via unique key constraints.
Hang scenarios are avoided because a later Try cannot insert after a successful Cancel.
The barrier works for XA, Saga, TCC, and transactional messages, dramatically simplifying error handling for developers.
Conclusion
The article covered fundamental concepts of distributed transactions, compared major solutions, explained typical failure modes, and presented the sub‑transaction barrier as an elegant way to achieve reliable, idempotent, and hang‑free transaction processing in microservice environments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
