Mastering Distributed Transactions: Strategies, Theory, and Real-World Solutions
This article explores the challenges of data consistency in distributed systems, explains fundamental concepts such as CAP and BASE, compares industry solutions like 2PC, Saga, TCC, and local message tables, and shares practical insights from payment service refactoring and Alipay's DTS framework.
Article Outline
1. Reason for sharing this topic 2. How distributed transaction problems are currently solved 3. Industry solutions 4. Advantages and disadvantages of each solution 5. How others handle it 6. Our possible approach
Payment Refactoring
When refactoring payment logic, the original local transaction becomes a cross‑application scenario. For example, a recharge order that previously updated order status and added coins within a single MySQL transaction now requires coordination between an Order Service and an Account Service.
Thus we need to address data consistency in distributed scenarios , which we refer to as distributed transactions .
The same issue appears in other contexts such as gifting and recharge success notifications.
1. Call payment service: deduct gift sender's coins, then add corresponding litchi to the streamer<br/>2. After step 1 succeeds, play effects and send chatroom gift commentWhen dealing with paid interfaces, consistency is crucial because money is involved.
How is distributed transaction currently solved?
Existing solutions already exist; let's examine them using a purchase‑to‑payment‑order example.
If the payment order is created successfully but the Kafka service fails to send the completion message, the order transaction has already been committed. How can we guarantee the message is eventually sent?
Process interpretation:
Green part
Normal flow: 1) Submit a job to JobController for fault recovery; 2) After success, process order logic; 3) Send Kafka message; 4) Delete the job.
Yellow part
Abnormal flow: data may become inconsistent and requires recovery. JobController periodically scans Redis for delayed tasks, retries them, and marks them successful or retries later.
Problems:
1) Redis‑based recovery may lose data. 2) No unified distributed‑transaction standard; could be abstracted as middleware. 3) Lack of execution‑policy management (e.g., max retries). 4) No transaction execution records, requiring log inspection.
Theoretical Foundations
Local vs. Distributed Transactions
Local transactions guarantee consistency within a single data source, while distributed transactions address consistency across multiple data sources.
Strong, Weak, and Eventual Consistency
From a client perspective, strong consistency means all subsequent reads see the latest write; weak consistency tolerates missing updates; eventual consistency guarantees that, after some time, reads will see the update.
From the server side, reducing the window to achieve eventual consistency improves availability and user experience. For distributed data systems:
N – number of data replicas
W – number of nodes that must acknowledge a write
R – number of nodes consulted for a read
If W+R > N, the system provides strong consistency (e.g., primary‑secondary sync replication with N=2, W=2, R=1). If W+R ≤ N, only weak consistency is guaranteed (e.g., async replication with N=2, W=1, R=1).
CAP Theory
In a distributed environment, it is impossible to guarantee consistency, availability, and partition tolerance simultaneously; systems must compromise, leading to eventual consistency.
BASE Theory
BASE (Basically Available, Soft state, Eventually consistent) is derived from CAP and emphasizes availability over strong consistency, allowing temporary inconsistency that eventually converges.
Flexible Transactions
Based on BASE, flexible transactions aim for eventual consistency by leveraging properties such as visibility, idempotence, and compensating actions.
Visibility (External Queryability)
Each step must expose a query interface and a globally unique identifier (e.g., order number) so that other services can determine the operation’s status.
Idempotent Operations
Idempotence means repeated execution with the same parameters yields the same result, which is essential for safe retries.
f(f(x)) = f(x)Industry Solutions
Two‑Phase Commit (2PC)
XA defines a Transaction Manager (TM) and Resource Manager (RM) interface. The TM generates a global txId and coordinates multiple local transactions.
2PC splits commit into prepare and commit/rollback phases. The prepare phase can lock resources for a long time, making it unsuitable for high‑concurrency or long‑lived sub‑transactions.
Saga
Saga addresses long‑running distributed processes by using compensating actions. If a later step fails, earlier successful steps are undone via compensation, achieving eventual consistency while preserving high availability.
Compensating Transaction (TCC)
TCC consists of Try, Confirm, and Cancel phases. Try reserves resources, Confirm finalizes, and Cancel rolls back. Example: transferring 100 CNY from account A to B involves Try‑Confirm‑Cancel logic in both services.
[Transfer Service] Try: verify A, check balance, deduct 100, record event<br/>Confirm: no action<br/>Cancel: credit back 100<br/>[Receive Service] Try: verify B<br/>Confirm: credit 100, release resources<br/>Cancel: no actionTCC requires significant business logic changes.
Local Message Table (Asynchronous Assurance)
This widely used pattern stores messages in a local table within the same transaction as business data. A separate process scans the table and retries failed messages, ensuring eventual delivery.
Transactional Message
Transactional messages combine a prepare message to MQ, local transaction execution, and commit/rollback decisions. Some MQs (e.g., RocketMQ) support this, while others like RabbitMQ and Kafka do not.
Maximum‑Effort Notification
This simple scheme repeatedly notifies the receiver until success or a retry limit is reached, optionally providing a query interface for reconciliation.
Solution Comparison
Alipay Distributed Transaction Service (DTS)
DTS is a framework that guarantees eventual consistency in large‑scale distributed environments. It consists of an xts‑client (embedded JAR) and an xts‑server (independent service) handling transaction data and recovery.
Core Concepts
Participants are divided into initiators and participants. Initiators start the transaction and coordinate participants; participants implement prepare, commit, and rollback interfaces and must ensure idempotence.
Activity and Action Records
Activity stores the global transaction ID and state; Action records each participant’s name and status, similar to the TCC model.
eBay Local Message Table
The approach originates from eBay and has been popularized by companies like Alipay. It splits remote distributed transactions into a series of local transactions using a database table.
Typical cross‑bank transfer example: first deduct funds and insert a message into the local table, then notify the counterpart bank.
Two common notification methods: high‑availability MQ subscription or periodic polling of the message table.
Various Third‑Party Payment Callbacks
Maximum‑effort notification is used by Alipay and WeChat: repeatedly invoke the callback until success or a failure threshold is reached.
Potential approaches:
2PC/3PC with XA‑compatible resources (MySQL, Redis) – rejected due to long locks.
TCC – requires try/confirm/cancel interfaces, increasing complexity.
Maximum‑effort notification – suitable for heterogeneous or platform services.
In summary, eBay’s classic pattern combines local transactions with reliable messaging to achieve eventual consistency, while transaction messages further encapsulate the local‑transaction work.
That's all for today. Follow me for more high‑quality technical articles!
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
