Databases 16 min read

Mastering Distributed Transactions: From Two-Phase Commit to Saga and Beyond

This article explains the fundamentals of distributed transactions, compares classic solutions such as XA, Saga, TCC, local message tables, and transactional messages, discusses their trade‑offs, and introduces advanced techniques like sub‑transaction barriers to handle network anomalies in microservice architectures.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Mastering Distributed Transactions: From Two-Phase Commit to Saga and Beyond

Background and Motivation

As business complexity grows, many systems evolve from monoliths to distributed microservice architectures, inevitably facing the challenge of maintaining data consistency across services. The article uses a simple money‑transfer example (debit A, credit B) to illustrate the need for atomic, consistent, isolated, and durable operations in a distributed setting.

Basic Theory

A database transaction guarantees that a group of statements either all succeed or all fail, adhering to the ACID properties:

Atomicity : All operations succeed or none do.

Consistency : Integrity constraints remain intact before and after the transaction.

Isolation : Concurrent transactions do not interfere with each other.

Durability : Once committed, changes survive failures.

Distributed Transaction Fundamentals

In a distributed scenario, the transaction coordinator, resource managers, and the application reside on different nodes. The article notes that distributed transactions often relax isolation and consistency to meet availability and performance requirements, following BASE principles while still preserving core ACID aspects where possible.

Classic Solutions

Two‑Phase Commit (XA)

XA defines a protocol between a global transaction manager (TM) and local resource managers (RM). It consists of:

Prepare : All RMs lock resources and report readiness.

Commit/Rollback : TM instructs RMs to finalize or abort.

Most major databases (MySQL, Oracle, SQL Server, PostgreSQL) support XA. Drawbacks include long‑lasting locks and low concurrency.

Saga

Saga breaks a long transaction into a series of local short transactions coordinated by a saga orchestrator. If any step fails, compensating actions are executed in reverse order. It offers higher concurrency but requires explicit compensation logic and provides weaker consistency.

Try‑Confirm‑Cancel (TCC)

TCC splits work into three phases:

Try : Perform checks and reserve resources.

Confirm : Execute the business operation using reserved resources (must be idempotent).

Cancel : Release reserved resources if the transaction aborts.

TCC achieves strong consistency without the long locks of XA, at the cost of higher development effort.

Local Message Table

Originally proposed by eBay, this pattern stores pending business tasks in a local message table within the same transaction, ensuring atomicity between business updates and message creation. A background worker later processes the messages.

Transactional Message (RocketMQ)

RocketMQ’s transactional message replaces the local table with a half‑message stored on the broker. The broker invokes the local transaction, then commits or rolls back the message based on the transaction outcome.

Maximum‑Effort Notification

This approach repeatedly attempts to notify the receiver, using exponential back‑off intervals, and provides a query interface for the receiver to pull the final result if notifications fail.

AT Mode (Seata)

Seata’s AT mode resembles XA but automates compensation, reducing developer effort while still suffering from long‑duration locks.

Network Anomalies in Distributed Transactions

Three failure categories are highlighted:

Empty Rollback : Cancel is called without a preceding Try.

Idempotency : Retries due to network glitches must not cause duplicate effects.

Hang : Cancel executes before Try, leading to inconsistent state.

These issues are illustrated with sequence diagrams.

Sub‑Transaction Barrier Technique

DTM introduces a barrier table sub_trans_barrier keyed by global‑transaction‑id, branch‑id, and branch‑type (try|confirm|cancel). The ThroughBarrierCall function ensures:

Empty compensation is prevented.

Idempotent execution via unique key constraints.

Hang scenarios are avoided because a later Try cannot insert after a successful Cancel.

The barrier works for XA, Saga, TCC, and transactional messages, dramatically simplifying error handling for developers.

Conclusion

The article covered fundamental concepts of distributed transactions, compared major solutions, explained typical failure modes, and presented the sub‑transaction barrier as an elegant way to achieve reliable, idempotent, and hang‑free transaction processing in microservice environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MicroservicestccDistributed Transactionstwo-phase commitDTMsaga patternsub-transaction barriertransactional message
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.