Understanding Distributed Transactions: 2PC, 3PC, Calvin, Spanner, and More
Distributed transactions require careful coordination to maintain consistency across multiple nodes, and this article explores core concepts such as atomic operations, two‑phase and three‑phase commit protocols, deterministic Calvin ordering, Google Spanner’s TrueTime and Paxos‑based approach, as well as advanced techniques like Percolator, RAMP, and coordination avoidance.
Atomicity of Multiple Operations
In distributed systems we must ensure a certain degree of consistency. While single‑object consistency models cover individual operations, many applications need to execute several operations atomically within a transaction.
Atomic operations can be viewed as state transitions: before a transaction the database is in state A, after it finishes the state becomes B. Transactions provide flexibility because they can be reordered or retried.
Transaction Processing and Serializability
Transaction processing focuses on defining a legal execution history that models possible interleavings. A history is represented as a dependency graph; if it is equivalent to some serial history, it is called serializable.
Single‑Partition vs Multi‑Partition Transactions
Single‑partition transactions can use pessimistic (lock‑based) or optimistic concurrency control, but these do not solve multi‑partition coordination, which requires distributed commit and rollback protocols.
For example, transferring money between two accounts must atomically debit one account and credit the other. Even this high‑level operation consists of many low‑level steps such as reading the old balance, computing the new value, and writing it back.
Atomic Commitment Algorithms
To make multiple operations appear atomic, especially when some are remote, atomic‑commit algorithms are used. They require that all participants agree; a single negative vote aborts the transaction. Byzantine faults break these algorithms.
The algorithm must decide when data is ready to be committed, how to execute the commit quickly, and how to roll back if the decision is to abort.
When can data be considered ready for commit?
How to execute the commit so that results become visible as soon as possible?
If the algorithm decides not to commit, how to roll back the changes?
Two‑Phase Commit (2PC)
2PC involves a coordinator (leader) and participants (cohorts). In the prepare phase the coordinator sends a proposal and collects votes; participants vote to commit or abort. In the commit/abort phase the coordinator sends the final decision. If any participant aborts, the whole transaction aborts.
Coordinator failures during the prepare phase block progress; participants wait indefinitely, making 2PC a blocking protocol. Various systems (MySQL, Kafka) use 2PC.
Three‑Phase Commit (3PC)
3PC adds a prepare phase before the commit phase to avoid blocking when the coordinator fails. It assumes a synchronous network and no communication failures. The three steps are propose, prepare, and commit. 3PC can still suffer from split‑brain in network partitions.
Calvin Distributed Transactions
Calvin achieves deterministic transaction ordering using a sequencer that batches transactions into epochs. The sequencer assigns a global order, the scheduler dispatches transactions to workers, and workers execute them locally without further coordination.
Each transaction has a read set and a write set. Workers perform four steps: analyze read/write sets, fetch local data, receive remote data, and finally persist results.
Spanner Distributed Transactions
Spanner uses TrueTime, a high‑precision clock API exposing uncertainty bounds, to assign timestamps and achieve external consistency. It supports read‑write, read‑only, and snapshot reads. Transactions acquire locks via a lock table and use Paxos groups for replication.
Two‑phase commit is used across shards, but Paxos groups replace single nodes, improving availability.
Database Partitioning and Consistent Hashing
Modern databases partition data into shards. Simple modulo hashing causes massive data movement when the cluster size changes. Consistent hashing maps keys onto a ring, reducing the amount of data that needs to be moved when nodes join or leave.
Percolator and Snapshot Isolation
Percolator implements transactions on top of Bigtable using a two‑phase commit with a timestamp oracle. It provides snapshot isolation, where reads see a consistent snapshot and write‑write conflicts are resolved by first‑committer‑wins.
Coordination Avoidance and RAMP
Coordination avoidance (Invariant Confluence) allows operations that preserve constraints to execute without coordination. RAMP (Read‑Atomic Multi‑Partition) transactions use multi‑version concurrency control and metadata to achieve read‑atomic isolation without blocking, employing a two‑phase commit for writes.
Summary
This chapter examined various methods for implementing distributed transactions, including atomic‑commit protocols (2PC, 3PC), deterministic ordering (Calvin), TrueTime‑based ordering (Spanner), partitioning strategies, Percolator’s snapshot isolation, and coordination‑avoidance techniques such as RAMP.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
