Understanding Relational and Distributed Transactions
This article explains the fundamentals of relational database transactions, the ACID properties, and various distributed transaction protocols such as 2PC, 3PC, TCC, message‑based approaches, and 1PC, while discussing their advantages, drawbacks, and practical considerations.
Relational Database Transactions
The concept of a transaction for programmers means a group of operations that must be executed as a single logical unit so that the system remains in a predictable, consistent state; either all succeed or none do, with partial success rolled back.
A (Atomicity) : all operations either complete fully or not at all.
C (Consistency) : integrity constraints are never violated before and after the transaction.
I (Isolation) : concurrent transactions do not interfere; intermediate states are invisible to others.
D (Durability) : once committed, changes are persisted and cannot be rolled back.
Example of a simple money transfer:
UserA.account -= 100 UserB.account += 100
Atomicity guarantees that both debit and credit happen together; consistency (as defined by ACID) refers to preserving database integrity constraints such as primary‑key, foreign‑key, NOT NULL, UNIQUE, and CHECK rules.
Isolation is implemented via locking or multiversion concurrency control. The four isolation levels defined by the SQL standard are READ_UNCOMMITTED, READ_COMMITTED, REPEATABLE_READ, and SERIALIZABLE, which address problems like dirty reads, non‑repeatable reads, and phantom reads. Different storage engines support different defaults (e.g., InnoDB defaults to REPEATABLE_READ, MongoDB only supports READ_UNCOMMITTED).
Durability relies on redo/undo logs and checkpoints to survive power loss or crashes.
Distributed Transactions
When data volume exceeds a single relational database’s capacity, sharding, NoSQL, or micro‑service architectures introduce multiple independent (often heterogeneous) databases, making traditional lock‑based transaction mechanisms infeasible.
The core challenge is the CAP trade‑off: in the presence of network partitions, a system must choose between strong consistency and high availability.
2PC (Two‑Phase Commit)
2PC is a classic, strongly consistent, centralized protocol with a coordinator and multiple participants. In the first phase the coordinator asks participants to prepare; in the second phase it decides to commit or abort based on unanimous votes.
Diagram omitted for brevity.
Advantages: strong consistency and wide support in some RDBMS. Drawbacks: poor fault tolerance, performance overhead, and blocking when a participant fails.
3PC (Three‑Phase Commit)
3PC adds an extra “pre‑commit” phase and timeout mechanisms to avoid the blocking problem of 2PC, at the cost of additional messages and latency.
TCC (Try‑Confirm‑Cancel)
TCC splits a business operation into three stages: Try reserves resources and performs checks, Confirm (or Commit) executes the operation idempotently, and Cancel releases reserved resources. It provides strong consistency while improving scalability, but requires substantial development effort for each sub‑service.
Message‑Based Distributed Transactions
These approaches split a global transaction into a local “primary” transaction and one or more “secondary” transactions, using asynchronous messages to achieve eventual consistency.
Local Message Table
The primary transaction writes both the business update and a message record within the same local transaction, ensuring atomicity.
BEGIN TRANSACTION; UPDATE User SET account = account - 100 WHERE userId='A'; INSERT INTO message(userId, amount, status) VALUES('A', 100, 1); COMMIT;
Transactional Message
Supported by message queues that can participate in a two‑phase commit, guaranteeing atomicity between the local transaction and the message send.
1PC (One‑Phase Commit)
1PC is used in primary‑secondary replication sets where a single commit is performed without a rollback step; if a subset of replicas fails, temporary inconsistency may occur but is resolved by replaying logs or retries.
Examples include MongoDB’s primary‑secondary replication with write concern w:majority, and distributed file systems like GFS that rely on checksum‑based recovery.
Reflection and Summary
In many real‑world scenarios, guaranteeing only atomicity (and thus application‑level consistency) is sufficient; isolation and durability are handled by the local transaction layer. Idempotence is crucial for retry logic in distributed environments. High availability often outweighs strong consistency, leading to designs that favor eventual consistency and compensation mechanisms rather than heavyweight distributed transaction protocols.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.