What Is a Distributed Transaction? Theory, Challenges, and Common Solutions
The article explains the concept of distributed transactions, why they are needed in micro‑service architectures, presents the CAP and BASE theories, and reviews major solutions such as two‑phase commit, three‑phase commit, TCC, local message tables, message‑based transactions, max‑effort notifications, Sagas and the Seata framework.
Distributed transactions refer to a transaction whose participants, transaction manager, and resource servers are located on different nodes of a distributed system, requiring all sub‑operations to either succeed together or fail together.
They become necessary when a business operation spans multiple services, such as a cross‑bank transfer where money must be deducted from one account and added to another atomically; failure in any step would lead to inconsistent state.
In monolithic applications, a simple @Transactional annotation guarantees atomicity, but in a distributed micro‑service architecture the operation is split across many services, demanding a mechanism to keep their states consistent.
Distributed Theory
CAP Theorem
Consistency (C): all replicas hold the same value at the same time.
Availability (A): the system continues to respond to reads/writes despite some node failures.
Partition tolerance (P): the system continues to operate despite network partitions.
A distributed system can satisfy at most two of these three properties simultaneously, forcing designers to trade off between consistency and availability.
BASE Theory
To achieve high availability, the BASE model extends CAP with:
Basically Available
Soft state
Eventual consistency
It acknowledges that strong consistency is often unattainable, so systems aim for eventual consistency based on business needs.
Distributed Transaction Solutions
Two‑Phase Commit (2PC)
2PC involves a global coordinator and multiple participants. In the first (prepare) phase, the coordinator writes a log and asks participants to prepare; participants write their own logs and reply. In the second (commit) phase, if all participants are ready, the coordinator logs a commit and sends commit commands; otherwise it logs an abort and sends abort commands. Problems include single‑point‑of‑failure of the coordinator, possible data inconsistency, long response time, and uncertainty when failures occur after a commit is sent.
Three‑Phase Commit (3PC)
3PC adds a CanCommit phase and timeout mechanisms to avoid the single‑point‑of‑failure issue of 2PC, but it still suffers from performance and consistency drawbacks.
Compensating Transaction (TCC)
TCC splits each operation into Try, Confirm, and Cancel phases. The Try phase reserves resources, Confirm finalizes the operation, and Cancel rolls back if needed. While it eliminates the coordinator’s single point of failure and reduces blocking, it increases business‑code complexity.
Local Message Table
Producers write messages and their send status into a local table within the same database transaction as the business data; consumers read the table, process the message, and retry on failure. This approach decouples the business transaction from the message delivery.
Message Transaction
Two‑phase messaging uses a message broker: a prepare message is sent, the local transaction executes, then a commit or rollback is issued to the broker, achieving eventual consistency without a local message table. Currently, Alibaba’s RocketMQ provides this capability.
Maximum Effort Notification
A simple scheme where system A sends a message to a queue after its local transaction; a dedicated service consumes the message and calls system B, retrying a fixed number of times before giving up.
Saga Model
Sagas break a long‑running transaction into a series of local short transactions coordinated by a saga orchestrator; if any step fails, compensating actions are executed in reverse order.
Seata Framework
Seata implements distributed transactions with three roles: Transaction Coordinator (TC), Transaction Manager (TM), and Resource Manager (RM). It maintains an UNDO_LOG table for each RM to enable rollback, handling global locks, commit, and rollback commands across micro‑services.
In summary, the article introduces fundamental theories behind distributed transactions and reviews several practical solutions, highlighting that the choice of approach depends on specific business requirements and that distributed transactions inevitably increase system complexity, code volume, and performance overhead.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.