Mastering Distributed Transactions: From XA to TCC and Beyond
This article explains the fundamentals of database transactions, defines distributed transactions, and provides a comprehensive comparison of classic solutions such as XA, SAGA, TCC, local message tables, transaction messages, maximum‑effort notifications, and AT mode, while also covering exception handling and the sub‑transaction barrier technique.
Fundamental Theory
Database transactions guarantee that a group of operations either all succeed or all fail, satisfying the ACID properties:
Atomicity : All operations complete or none do; on error the system rolls back to the state before the transaction.
Consistency : Integrity constraints (foreign keys, custom rules) remain intact before and after the transaction.
Isolation : Concurrent transactions do not interfere with each other, preventing inconsistent intermediate states.
Durability : Once committed, changes survive system failures.
Distributed Transaction
A business operation that spans multiple nodes—e.g., a cross‑bank transfer where account A is debited on one system and account B is credited on another—cannot be guaranteed by a single local transaction. Distributed transactions coordinate multiple resource managers and a transaction manager to ensure correct execution across the network. They often relax strict ACID requirements to improve availability and performance, following BASE principles (Basic Availability, Soft state, Eventual consistency) while still preserving essential atomicity and durability.
Classic Solutions
Two‑Phase Commit (XA)
XA, defined by the X/Open group, introduces a global transaction manager (TM) and local resource managers (RM). The protocol consists of:
Prepare : All RMs lock required resources and report readiness to the TM.
Commit / Rollback : If every RM is ready, the TM issues a commit; otherwise it issues a rollback.
Most mainstream databases (MySQL, Oracle, SQL Server, PostgreSQL) support XA. If any participant fails during the prepare phase, the TM rolls back all participants. XA is simple to understand but holds locks for a long time, reducing concurrency.
SAGA
SAGA splits a long‑running transaction into a series of local short transactions, each with a compensating action. If all steps succeed, the saga completes; if a step fails, previously successful steps are undone in reverse order.
Advantages: higher concurrency because resources are not locked for the whole duration. Drawbacks: requires explicit definition of compensating actions and offers weaker consistency—e.g., a debit may be recorded while the credit fails.
TCC (Try‑Confirm‑Cancel)
TCC defines three phases:
Try : Perform all checks and reserve necessary resources.
Confirm : Execute the actual business logic using the reserved resources; must be idempotent.
Cancel : Release reserved resources if the transaction is aborted; also idempotent.
Typical usage in a money‑transfer scenario: freeze the amount in the Try phase, deduct in Confirm, and release in Cancel. TCC offers high concurrency and strong consistency, but requires three interfaces to be implemented.
Local Message Table
Proposed by eBay architect Dan Pritchett (2008), this pattern stores pending tasks in a local message table and processes them asynchronously. The write of the business data and the insertion of the message occur within the same local transaction, guaranteeing atomicity.
Pros: simple split of long transactions into independent tasks. Cons: introduces an extra table, requires polling, and the consumer must handle retry and possible rollback logic.
Transaction Message (RocketMQ)
RocketMQ’s transaction message abstracts the local message table by storing half‑messages on the broker. Workflow:
Send a half‑message.
Broker records the message and returns the write result.
Execute the local transaction; if it fails, the half‑message remains invisible.
Based on the local transaction outcome, the broker either commits (making the message visible) or rolls back.
If the broker does not receive a commit/rollback, it periodically checks the transaction status via a callback.
Maximum‑Effort Notification
This pattern ensures that the initiator makes its best effort to notify the receiver of the processing result. It includes retry mechanisms, duplicate‑notification handling, and a pull‑based query interface for the receiver to verify the final status.
AT Mode (Seata)
Seata’s AT mode resembles XA but automates compensation and rollback, reducing developer effort. It still suffers from long‑duration locks, making it unsuitable for high‑concurrency scenarios.
Exception Handling in Distributed Transactions
Potential failures include network glitches and business errors. Three key properties are required:
Empty Rollback : If Cancel is called without a preceding Try, the system must recognize it as a no‑op and succeed.
Idempotency : All branch calls must be safe to repeat without side effects.
Hang Prevention : Cancel must not execute before Try; the system must detect and block such ordering.
Sub‑Transaction Barrier
The open‑source project https://github.com/yedf/dtm introduces a sub‑transaction barrier that records each branch’s state (global‑id‑branch‑phase) in a local table. The barrier logic ensures:
Empty compensation control: Cancel without prior Try is ignored.
Idempotent execution: Unique keys prevent duplicate processing.
Hang prevention: Try after Cancel fails to insert, so the business logic is skipped.
Implementation example (Go):
func ThroughBarrierCall(db *sql.DB, transInfo *TransInfo, busiCall BusiFunc)Developers place their business logic inside busiCall. The barrier guarantees that in empty‑rollback, hang, or duplicate scenarios the logic is either not executed or executed exactly once.
The barrier works for TCC, SAGA, XA, and transaction‑message patterns, and can be adapted to other frameworks.
Barrier Mechanism Details
A table sub_trans_barrier is created with a unique key composed of gid (global transaction ID), branch_id, and phase (try|confirm|cancel). The processing steps are:
Start a local DB transaction.
If the request is a Try branch, attempt INSERT IGNORE of gid‑branch‑try. On success, execute the business logic inside the barrier.
If the request is a Confirm branch, attempt INSERT IGNORE of gid‑branch‑confirm. On success, execute the business logic.
If the request is a Cancel branch, first attempt INSERT IGNORE of gid‑branch‑try (to detect empty rollback), then INSERT IGNORE of gid‑branch‑cancel. If the Try record does not exist and Cancel insert succeeds, the Cancel logic is skipped.
Commit the DB transaction if the barrier logic returns success; otherwise roll back.
This algorithm provides:
Empty compensation control – Cancel without a prior Try is filtered out.
Idempotent control – Duplicate inserts are prevented by the unique key.
Hang prevention – Try after Cancel cannot insert, so the business code is not run.
Conclusion
The article covered transaction fundamentals, compared major distributed‑transaction solutions, analyzed common failure modes, and presented the sub‑transaction‑barrier technique as an elegant way to handle network‑induced anomalies while keeping developer code simple. The project https://github.com/yedf/dtm currently provides Go SDK support for TCC, XA, SAGA, transaction messages, and maximum‑effort notifications; SDKs for other languages are planned.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
