How to Guarantee Data Consistency in Distributed Transactions: A Practical Deep‑Dive
This article examines the challenges of maintaining data consistency across micro‑service boundaries, walks through real‑world payment and gifting scenarios, compares classic solutions such as 2PC, saga, TCC, local‑message tables and transaction messages, and finally recommends a pragmatic approach for building reliable distributed transaction mechanisms.
Why Distributed Transactions Matter
When a payment system is refactored from a monolithic design to separate order and account services, a single MySQL transaction can no longer guarantee that updating the order status and crediting the user’s balance happen atomically. The article starts with a concrete recharge‑order example: originally the order module and account module lived together, allowing a single local transaction; after service split, two independent services must coordinate.
Similar consistency problems appear in other flows, such as gifting (deducting the sender’s coins, then crediting the streamer) and sending a Kafka message after a successful recharge. Because money is involved, any inconsistency is unacceptable.
Current Ways to Solve Distributed Transactions
The author first outlines the problem with a concrete scenario: after a purchase succeeds, the system must send a Kafka message, but if Kafka fails while the order transaction has already committed, the message may be lost. A step‑by‑step flow diagram (shown in the first image) illustrates the normal (green) and failure (yellow) paths, and the recovery logic that periodically scans Redis for delayed jobs.
Key issues identified:
Reliance on Redis for recovery introduces potential data loss.
Lack of a unified distributed‑transaction specification.
No management of retry limits or execution state.
Debugging requires digging through logs because transaction status isn’t recorded.
Theoretical Foundations
The article explains the difference between local and distributed transactions, then introduces consistency models (strong, weak, eventual) and the CAP theorem, emphasizing that a distributed system must trade off consistency for availability. It also covers the BASE model, which accepts eventual consistency in exchange for high availability, and the concept of flexible transactions that rely on BASE principles.
Two technical prerequisites for any solution are highlighted: the need for visibility (queryable status of each step) and idempotence (re‑executing an operation yields the same result). The article includes a mathematical definition of idempotence (f(f(x)) = f(x)) and practical ways to achieve it, such as caching results or detecting duplicate requests.
Industry Solutions
Two‑Phase Commit (2PC)
Based on the XA specification, 2PC coordinates a global transaction ID across resource managers. The prepare phase blocks until all participants acknowledge, which can lead to long lock times and poor performance under high concurrency. The article notes that 2PC sacrifices availability for strong consistency.
Saga
Saga addresses long‑running processes by breaking a transaction into a series of compensating steps. The author walks through a scenario where Service A commits, Service B fails, and a compensating action rolls back Service A’s changes, achieving eventual consistency while keeping the system highly available.
Transactional Compensating Control (TCC)
TCC requires each business operation to expose Try, Confirm, and Cancel interfaces. The article provides a detailed TCC implementation for a 100‑yuan transfer from Account A to Account B, complete with a code block that lists the exact steps for each service.
[Transfer Service]
Try:
- Verify A’s account status and balance
- Deduct 100 yuan, set status to "transferring"
- Record the intent in a log/message
Confirm:
- No action needed
Cancel:
- Refund 100 yuan to A
- Release reserved resources
[Receive Service]
Try:
- Verify B’s account status
Confirm:
- Credit 100 yuan to B
- Release reserved resources
Cancel:
- No action neededThe author points out that TCC is highly invasive and difficult to adopt.
Local Message Table (Asynchronous Assurance)
Originating from eBay, this pattern stores outgoing messages in a local table within the same transaction as the business data. A separate consumer reads the table and retries failed deliveries. The article includes a diagram (second image) and describes the producer‑consumer workflow, emphasizing its simplicity and high performance.
Transactional Messages
Transactional messages combine the two‑phase commit idea with a message queue. The flow (third image) shows a prepare message, local transaction execution, and a final commit or rollback based on the local outcome. The author notes that only RocketMQ supports true transactional messages; RabbitMQ and Kafka do not.
Best‑Effort Notification
This lightweight approach repeatedly sends a notification until it succeeds or a retry limit is reached, then allows the passive side to query for missing events. It is suitable for low‑sensitivity consistency requirements and cross‑enterprise integrations.
Solution Comparison
The original table is summarized in prose: 2PC offers strong consistency but medium complexity and low performance; TCC provides weak consistency with high complexity and high maintenance cost; local‑message tables and transactional messages give weak consistency but high performance and low maintenance; best‑effort notification also yields weak consistency with low complexity.
Real‑World Implementations
Alipay Distributed Transaction Service (DTS)
DTS splits the system into xts‑client (a JAR embedded in the application) and xts‑server (a standalone service). It defines a transaction ID (TX_ID) and state (STATE) in an Activity record, and each participant registers Prepare, Commit, and Rollback interfaces, effectively implementing a TCC‑style workflow.
eBay Local Message Table
eBay’s pattern stores messages in a relational table alongside business data, then uses either a high‑throughput MQ or periodic polling to deliver them. The article cites similar adoptions by Qunar and Mogujie.
Third‑Party Payment Callbacks
Payment platforms like Alipay and WeChat use best‑effort notification: they keep retrying the callback until it succeeds or the retry count expires.
Recommended Approach for Our Scenario
The author eliminates 2PC/3PC because they require XA‑compatible resources and lock transaction resources, hurting performance. TCC is also rejected due to its high code‑level intrusion. The remaining viable path is to adopt a transactional message strategy, which encapsulates the local transaction within a message‑driven workflow, achieving the desired eventual consistency while keeping complexity low.
In summary, by analyzing the trade‑offs of each technique, the article concludes that a message‑centric design—either a local‑message table with reliable retry or a true transactional message (if the MQ supports it)—best fits the payment‑recharge use case.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
