How to Guarantee Data Consistency in Distributed Transactions: A Practical Deep‑Dive

This article examines the challenges of maintaining data consistency across micro‑service boundaries, walks through real‑world payment and gifting scenarios, compares classic solutions such as 2PC, saga, TCC, local‑message tables and transaction messages, and finally recommends a pragmatic approach for building reliable distributed transaction mechanisms.

Architect
Architect
Architect
How to Guarantee Data Consistency in Distributed Transactions: A Practical Deep‑Dive

Why Distributed Transactions Matter

When a payment system is refactored from a monolithic design to separate order and account services, a single MySQL transaction can no longer guarantee that updating the order status and crediting the user’s balance happen atomically. The article starts with a concrete recharge‑order example: originally the order module and account module lived together, allowing a single local transaction; after service split, two independent services must coordinate.

Similar consistency problems appear in other flows, such as gifting (deducting the sender’s coins, then crediting the streamer) and sending a Kafka message after a successful recharge. Because money is involved, any inconsistency is unacceptable.

Current Ways to Solve Distributed Transactions

The author first outlines the problem with a concrete scenario: after a purchase succeeds, the system must send a Kafka message, but if Kafka fails while the order transaction has already committed, the message may be lost. A step‑by‑step flow diagram (shown in the first image) illustrates the normal (green) and failure (yellow) paths, and the recovery logic that periodically scans Redis for delayed jobs.

Key issues identified:

Reliance on Redis for recovery introduces potential data loss.

Lack of a unified distributed‑transaction specification.

No management of retry limits or execution state.

Debugging requires digging through logs because transaction status isn’t recorded.

Theoretical Foundations

The article explains the difference between local and distributed transactions, then introduces consistency models (strong, weak, eventual) and the CAP theorem, emphasizing that a distributed system must trade off consistency for availability. It also covers the BASE model, which accepts eventual consistency in exchange for high availability, and the concept of flexible transactions that rely on BASE principles.

Two technical prerequisites for any solution are highlighted: the need for visibility (queryable status of each step) and idempotence (re‑executing an operation yields the same result). The article includes a mathematical definition of idempotence (f(f(x)) = f(x)) and practical ways to achieve it, such as caching results or detecting duplicate requests.

Industry Solutions

Two‑Phase Commit (2PC)

Based on the XA specification, 2PC coordinates a global transaction ID across resource managers. The prepare phase blocks until all participants acknowledge, which can lead to long lock times and poor performance under high concurrency. The article notes that 2PC sacrifices availability for strong consistency.

Saga

Saga addresses long‑running processes by breaking a transaction into a series of compensating steps. The author walks through a scenario where Service A commits, Service B fails, and a compensating action rolls back Service A’s changes, achieving eventual consistency while keeping the system highly available.

Transactional Compensating Control (TCC)

TCC requires each business operation to expose Try, Confirm, and Cancel interfaces. The article provides a detailed TCC implementation for a 100‑yuan transfer from Account A to Account B, complete with a code block that lists the exact steps for each service.

[Transfer Service]
Try:
    - Verify A’s account status and balance
    - Deduct 100 yuan, set status to "transferring"
    - Record the intent in a log/message
Confirm:
    - No action needed
Cancel:
    - Refund 100 yuan to A
    - Release reserved resources

[Receive Service]
Try:
    - Verify B’s account status
Confirm:
    - Credit 100 yuan to B
    - Release reserved resources
Cancel:
    - No action needed

The author points out that TCC is highly invasive and difficult to adopt.

Local Message Table (Asynchronous Assurance)

Originating from eBay, this pattern stores outgoing messages in a local table within the same transaction as the business data. A separate consumer reads the table and retries failed deliveries. The article includes a diagram (second image) and describes the producer‑consumer workflow, emphasizing its simplicity and high performance.

Transactional Messages

Transactional messages combine the two‑phase commit idea with a message queue. The flow (third image) shows a prepare message, local transaction execution, and a final commit or rollback based on the local outcome. The author notes that only RocketMQ supports true transactional messages; RabbitMQ and Kafka do not.

Best‑Effort Notification

This lightweight approach repeatedly sends a notification until it succeeds or a retry limit is reached, then allows the passive side to query for missing events. It is suitable for low‑sensitivity consistency requirements and cross‑enterprise integrations.

Solution Comparison

The original table is summarized in prose: 2PC offers strong consistency but medium complexity and low performance; TCC provides weak consistency with high complexity and high maintenance cost; local‑message tables and transactional messages give weak consistency but high performance and low maintenance; best‑effort notification also yields weak consistency with low complexity.

Real‑World Implementations

Alipay Distributed Transaction Service (DTS)

DTS splits the system into xts‑client (a JAR embedded in the application) and xts‑server (a standalone service). It defines a transaction ID (TX_ID) and state (STATE) in an Activity record, and each participant registers Prepare, Commit, and Rollback interfaces, effectively implementing a TCC‑style workflow.

eBay Local Message Table

eBay’s pattern stores messages in a relational table alongside business data, then uses either a high‑throughput MQ or periodic polling to deliver them. The article cites similar adoptions by Qunar and Mogujie.

Third‑Party Payment Callbacks

Payment platforms like Alipay and WeChat use best‑effort notification: they keep retrying the callback until it succeeds or the retry count expires.

Recommended Approach for Our Scenario

The author eliminates 2PC/3PC because they require XA‑compatible resources and lock transaction resources, hurting performance. TCC is also rejected due to its high code‑level intrusion. The remaining viable path is to adopt a transactional message strategy, which encapsulates the local transaction within a message‑driven workflow, achieving the desired eventual consistency while keeping complexity low.

In summary, by analyzing the trade‑offs of each technique, the article concludes that a message‑centric design—either a local‑message table with reliable retry or a true transactional message (if the MQ supports it)—best fits the payment‑recharge use case.

Payment refactor illustration
Payment refactor illustration
Distributed transaction flow diagram
Distributed transaction flow diagram
CAP theorem illustration
CAP theorem illustration
2PC prepare/commit phases
2PC prepare/commit phases
TCC transfer example
TCC transfer example
Local message table workflow
Local message table workflow
Transactional message flow
Transactional message flow
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend ArchitectureMicroservicesData Consistency2PCtccDistributed Transactionstransactional messages
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.