How Ant Financial Solved Distributed Transaction Challenges with TCC, FMT, and XA
This article reviews Ant Financial's evolution of distributed transaction solutions—from early SOA consistency issues to the high‑performance TCC model, the developer‑friendly FMT approach, and the real‑time XA integration with OceanBase—highlighting architectural decisions, optimizations, and practical outcomes.
Background
Rapid growth of internet services created massive data volumes and widespread adoption of distributed systems. In large‑scale financial applications, high availability, scalability, and strict consistency are required. During Ant Financial’s micro‑service transformation, two challenges emerged: (1) cross‑service data consistency after service decomposition, and (2) maintaining performance under peak transaction loads (e.g., Double‑11).
Limitations of XA
The traditional XA two‑phase commit protocol guarantees ACID properties but incurs high latency and resource consumption, making it unsuitable for hotspot data handling in high‑throughput scenarios.
TCC (Try‑Confirm‑Cancel) Model
The TCC model implements a business‑layer two‑phase commit while adhering to the BASE (Basic Availability, Soft state, Eventual consistency) philosophy.
Main business service : initiates and orchestrates the global transaction.
Business services : expose three idempotent operations— Try (resource reservation), Confirm (final commit), and Cancel (rollback).
Business activity manager : records global and sub‑transaction states and triggers Confirm or Cancel based on the final outcome.
Key properties:
Resources are locked only during the Try phase. Confirm and Cancel are idempotent, guaranteeing exactly‑once execution.
Performance Optimizations for TCC
Same‑database mode : The activity manager’s state is stored locally in the same database as the business data, eliminating remote RPC calls and reducing database round‑trips.
Asynchronous second phase : The Confirm phase is deferred to low‑traffic periods. Assuming Try and Confirm have similar latency, this cuts overall transaction latency by ~50 % and halves database resource consumption.
FMT (Framework‑Managed Transaction) Model
To lower the integration barrier of TCC, the FMT model removes the need for explicit Try/Confirm/Cancel methods. Developers write standard JDBC code; the framework intercepts SQL, captures pre‑ and post‑state snapshots, and automatically generates the two‑phase actions.
During the first phase, the framework records a logical undo/redo snapshot for each row. In the second phase, if the global transaction commits, the snapshots are discarded; if it rolls back, the framework checks for concurrent modifications (dirty writes) and restores the pre‑state, ensuring idempotent rollback.
XA Model with Real‑Time Consistency
In 2018 Ant Financial released a third‑generation solution that fully supports the standard XA protocol and integrates with the OceanBase database. The solution adds two major enhancements to mitigate XA’s performance penalties:
Distributed MVCC : A global snapshot mechanism ensures that reads see a consistent view across all participating shards, preventing intermediate states from being observed.
Commit‑delay optimization with OceanBase : During the commit phase, OceanBase attaches minimal coordination metadata and persists logs in a single round‑trip, reducing the classic XA sequence of three log writes, one transaction write, and two RPCs to essentially one log write, one transaction write, and two RPCs from the client’s perspective.
Operational Statistics
By 2017, the TCC implementation (named DTX internally) was adopted by over 100 services, handling payment, transfer, wealth, and insurance scenarios. During Double‑11 2017, the system sustained a peak of 256 k TPS, processing billions of yuan daily while maintaining strict consistency.
Key Takeaways
**TCC** provides high‑performance eventual consistency by locking resources only in the Try phase and allowing concurrent Confirm / Cancel execution.
**Same‑database** and **asynchronous confirm** optimizations reduce RPC overhead and halve latency for hotspot workloads.
**FMT** offers a near‑zero‑intrusion path for cloud customers: standard JDBC code is automatically managed as a distributed transaction.
**XA** with distributed MVCC and OceanBase‑specific commit‑delay tuning delivers real‑time consistency without the classic XA performance hit.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
