Databases 20 min read

Mastering Distributed Transactions: From ACID Basics to 2PC, TCC, and Saga Patterns

This article explains the fundamentals of database transactions, the ACID properties, and why distributed transactions are needed, then walks through the implementation details of redo/undo logs, local transactions, CAP and BASE theory, and evaluates five major distributed‑transaction solutions—2PC, TCC, local‑message tables, maximum‑effort notification, and Saga—with concrete examples, pros, and cons.

Architect
Architect
Architect
Mastering Distributed Transactions: From ACID Basics to 2PC, TCC, and Saga Patterns

Database Transaction Fundamentals

A transaction is an atomic unit of work: either all its operations succeed or none do. The ACID properties are enforced by log mechanisms:

Atomicity : undo log records inverse operations; on error or ROLLBACK the system reverts to the pre‑transaction state.

Durability : redo log persists page‑level changes; after a crash the system replays the redo log to recover committed data.

Isolation : achieved with locks and Multi‑Version Concurrency Control (MVCC).

Consistency : guaranteed by the combination of atomicity, isolation and recovery.

Local Transaction Context

In a single‑server, single‑relational‑database deployment the transaction manager (e.g., JDBC) controls commit/rollback locally.

InnoDB Log Architecture

InnoDB maintains two complementary logs:

Redo log (physical): records page modifications to enable recovery of committed data.

Undo log (logical): records inverse operations (e.g., an INSERT entry for a DELETE) to support rollback.

Why Distributed Transactions?

When a system is split into micro‑services, each service owns its own database. A single local transaction cannot guarantee cross‑service consistency. Two illustrative scenarios:

Micro‑service order flow : a user purchases a gift; the coin service, order service, and gift service each write to separate databases. All three updates must succeed atomically.

Sharding across data centers : transferring $10 from an account in Beijing to one in Shenzhen requires coordinated updates on two shards.

CAP and BASE Trade‑offs

Distributed systems must choose between Consistency (C), Availability (A) and Partition tolerance (P). The three classic choices are:

CA : give up partition tolerance; typical of single‑node databases.

AP : give up consistency; many modern distributed stores.

CP : give up availability; strongly consistent stores that block during partitions.

BASE (Basically Available, Soft state, Eventually consistent) relaxes consistency to achieve higher availability and partition tolerance.

Distributed Transaction Patterns

The following five patterns are examined, each with concrete steps, advantages, and drawbacks.

1. Two‑Phase Commit (2PC)

2PC separates commit into a prepare phase and a commit phase :

Transaction manager sends PREPARE to each resource manager.

If all reply success, the manager sends COMMIT; otherwise it sends ROLLBACK.

Advantages : simple implementation, low cost.

Drawbacks :

Single‑point failure – if the manager crashes, participants remain locked.

Performance bottleneck – all participants block synchronously during the commit phase.

Potential inconsistency – a participant may receive COMMIT while another receives ROLLBACK.

2. TCC (Try‑Confirm‑Cancel)

TCC decomposes each business operation into three idempotent methods:

Try : perform all consistency checks and reserve required resources.

Confirm : finalize the operation without further checks (assumes Try succeeded).

Cancel : release reserved resources and roll back any effects if Try or Confirm fails.

Example – Gift Purchase :

User A has 100 coins and 5 roses. A buys 10 roses for 10 coins. The three services execute: Try: create a pending order, freeze 10 coins, reserve 10 roses. If all Try steps succeed, move to Confirm. Confirm: mark order paid, deduct 10 coins, increase rose count to 15. If any step fails, Cancel reverts order status, restores 100 coins and 5 roses.

Pros : fine‑grained control, reduced lock contention, better performance.

Cons :

High application intrusion – business code must implement three methods.

Complex compensation logic for network or system failures; open‑source frameworks such as ByteTCC, TCC‑transaction, or Himly are often required.

3. Local Message Table (eBay Pattern)

This pattern treats a distributed transaction as a series of local transactions plus an asynchronous message.

The producer writes business data and a message record within the same local transaction.

After commit, the message is sent to a message queue (MQ).

The consumer reads the MQ, processes its own business logic, and acknowledges.

If processing fails, the consumer retries; if business logic fails, a compensation message is sent back.

Advantages : achieves eventual consistency without a global lock.

Drawbacks : couples the message table to business logic, increasing system complexity.

4. Maximum‑Effort Notification

This approach focuses on reliably delivering the result of a transaction to a downstream system using MQ acknowledgment semantics.

Initiator publishes a notification to MQ.

Receiver listens on MQ, processes the message, and sends an ACK.

If no ACK is received, MQ retries with increasing intervals (1 min, 5 min, 10 min, …).

Receiver may also query a reconciliation API to ensure consistency.

Illustrative Scenario – Enterprise Banking Transfer :

The transfer system completes the transfer and notifies the bank via MQ. If the bank does not receive the notification, it actively queries the transfer system for the result.

5. Saga

Originally proposed by Hector Garcia‑Molina and Kenneth Salem, Saga breaks a long transaction into a sequence of short local transactions T1 … Tn, each with a compensating action Ci. The orchestrator drives the workflow:

Normal path : execute T1 → T2 → … → Tn.

Failure path : if Ti fails, run compensations in reverse order C(i‑1) → … → C1.

Two recovery strategies are discussed:

Backward recovery : compensate all completed steps when a failure occurs.

Forward recovery : retry the failed step while assuming earlier steps remain valid.

Example – Order Workflow :

T1: create order; T2: deduct user balance; T3: add roses to user; T4: decrement inventory. Compensations C4‑C1 reverse each step. If T4 fails, C4 restores inventory, but the user’s roses may already be consumed, exposing a lack of isolation.

Mitigations include application‑level locks, session isolation, pre‑freezing funds, and real‑time state checks.

Conclusion

Each pattern trades off latency, consistency guarantees, operational complexity, and failure‑handling capabilities:

2PC offers simplicity but suffers from blocking and single‑point failure.

TCC provides high performance at the cost of intrusive business code and complex compensation.

Local Message Table achieves eventual consistency with minimal coordination but introduces coupling between messaging and domain data.

Maximum‑Effort Notification focuses on reliable delivery using MQ ACKs, suitable when downstream systems can tolerate retries.

Saga enables loosely coupled micro‑services with compensating actions, yet requires careful design to avoid isolation anomalies.

Understanding InnoDB’s undo/redo logs, the CAP/BASE spectrum, and the detailed workflow of each pattern equips architects to select the most appropriate distributed‑transaction strategy for their micro‑service ecosystem.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CAP theorem2PCtccACIDBASEDistributed Transactionssaga
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.