Master‑Slave, Master‑Master, Paxos: Choosing the Right Distributed Transaction Strategy
This article compares common distributed transaction approaches—including Master‑Slave, Master‑Master, two‑phase and three‑phase commit, and Paxos—explains their mechanisms, trade‑offs, and real‑world implementations such as Alibaba’s TCC, GTS, LCN, and TXC, helping architects select the most suitable solution for high‑availability data consistency.
In the previous article "Billion‑level Traffic Architecture: Distributed Transaction Ideas and Methods", the evolution from local to distributed transactions was outlined. This piece focuses on the most popular distributed transaction solutions from an application perspective and examines several commercial implementations.
Master‑Slave scheme
Master‑Master scheme
Two‑phase and three‑phase commit
Paxos scheme
Master‑Slave Scheme
This classic master‑slave (primary‑secondary) model designates the slave as a backup of the master. All read‑write requests go to the master; writes are replicated to the slave either synchronously or asynchronously, typically with the slave pulling updates, resulting in eventual consistency.
The main risk is that if the master fails before a replication cycle completes, data written in that interval can be lost. To avoid loss, the slave must operate in read‑only mode until the master recovers.
If data loss is tolerable, the slave can immediately take over the master role for compute‑only nodes. The scheme can also be made strongly consistent by writing to the master first, then to the slave, and only returning success after both writes succeed. If the slave write fails, either the slave is marked unavailable and the system continues, or the entire transaction is rolled back.
Note: Generally, writes are not performed on the slave first because a failure on the master would require rolling back the slave, which can be complex.
Master‑Master Scheme
In a multi‑master configuration, multiple masters provide read‑write services. Synchronization between masters is usually asynchronous, yielding eventual consistency. If one master fails, others continue serving reads and writes. However, concurrent updates to the same data on different masters can cause conflicts, which must be resolved at the application level (e.g., Dynamo’s version‑based conflict resolution).
Two‑Phase and Three‑Phase Commit
The two‑phase (and three‑phase) commit protocols are the core business‑level distributed transaction mechanisms. Detailed explanations are provided in the previous article; readers are encouraged to review that content for a deeper understanding.
Paxos Scheme
To illustrate Paxos, consider the classic Two‑Generals problem: two armies must coordinate an attack across a valley where messengers may be captured. Multiple rounds of acknowledgments cannot guarantee both generals are confident the messages arrived, highlighting the need for a robust consensus algorithm.
Two‑Generals Problem
Two generals must agree on an attack time but can only communicate via messengers that might be intercepted. Even with repeated confirmations, they cannot be certain the other side received the message, illustrating the impossibility of reliable coordination without a consensus protocol.
Paxos Algorithm
Paxos solves the problem of achieving agreement on a value in a distributed system despite failures. It defines four roles: Client (proposer of a value), Proposer, Acceptor, and Learner. The algorithm proceeds in two phases:
Prepare phase: A proposer sends a Prepare request with a unique, monotonically increasing proposal number to all acceptors. Acceptors respond with a promise not to accept proposals with lower numbers and may include the highest accepted proposal.
Accept phase: If the proposer receives promises from a majority, it sends an Accept request with the same proposal number. Acceptors accept the proposal if it is the highest they have seen, and the value is then chosen.
This two‑stage process ensures that a majority of nodes agree on a single value, providing strong consistency. Paxos is considered the definitive consensus algorithm; other protocols (e.g., 2PC/3PC) are viewed as incomplete variants.
Commercial Products
GTS
GTS (Global Transaction Service) is Alibaba’s middleware that provides a universal solution for microservice‑level distributed transactions with strong consistency guarantees.
Key advantages:
Decouples transaction logic from business code, allowing developers to focus on core services.
Minimal code intrusion—only an @TxcTransaction annotation is needed.
Performance is 8–10 times better than traditional XA solutions.
GTS Architecture
GTS consists of three components: GTS Client (initiates and ends transactions), GTS Resource Manager (handles branch transaction operations), and GTS Server (coordinates the overall transaction lifecycle).
LCN
TX‑LCN is a transaction coordination framework that does not manipulate data directly but coordinates existing local transactions to achieve global consistency. It comprises TxClient and TxManager.
Core steps:
Create transaction group: TxManager generates a GroupId before business code execution.
Join transaction group: Participants report their transaction information to TxManager after execution.
Notify transaction group: After the initiator finishes, TxManager decides to commit or roll back based on the overall status.
LCN Transaction Modes
LCN supports three modes: LCN, TCC, and TXC.
LCN Mode
Implements transaction control by proxying the database connection; TxManager coordinates commits and rollbacks across services.
Low code intrusion.
Applicable only when a connection object can be controlled.
Provides strong data consistency.
Increases connection hold time.
TCC Mode
Implements Try‑Confirm‑Cancel semantics: Try executes business logic, Confirm finalizes it, and Cancel rolls it back. It requires developers to implement three methods per business operation.
High code intrusion.
Supports both local‑transaction‑aware and unaware services.
Consistency is fully managed by developers.
TXC Mode
TXC (Taobao Transaction Constructor) records SQL impact data and acquires a Redis‑based distributed lock before execution. On rollback, it uses the saved information to revert changes.
Low code intrusion.
Works only with SQL‑based services.
Higher resource consumption due to pre‑execution queries.
Does not occupy database connections.
TXC Sub‑Modes
Standard (AT) Mode: Automatic transaction handling based on TDDL data sources; the framework splits SQL into branches and manages them as a distributed transaction.
Custom (MT) Mode: Allows developers to intervene in the two‑phase commit process, providing flexibility for special scenarios.
Retry (RT) Mode: Not a traditional transaction; it continuously retries failed SQL statements asynchronously until success or a timeout, relieving developers from manual retry logic.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
