Mastering Distributed Transactions: From ACID to TCC and Sharding Strategies
This article provides a comprehensive technical guide to distributed transactions, covering ACID fundamentals, consistency models, sharding techniques, CAP and BASE theories, and detailed implementations of two‑phase, three‑phase, and TCC protocols, while highlighting their advantages, limitations, and practical considerations.
What Is a Distributed Transaction?
In everyday life a transaction either completes fully or not at all; similarly, a distributed transaction must guarantee atomic execution across multiple databases despite network unreliability, latency, or failures such as power loss.
ACID Properties
Atomicity : A transaction cannot be split; all its steps succeed together or none at all.
Consistency : The total amount of money before and after a transfer remains unchanged, ensuring no intermediate state violates business rules.
Isolation : Concurrent transactions do not interfere; for example, two independent transfers involving different accounts run without affecting each other.
Durability : Once committed, changes are persisted to disk and survive crashes.
Consistency Models
Beyond strict ACID consistency, systems adopt:
Strong consistency : Every read sees the latest write.
Weak consistency : Reads may return stale data; updates are not immediately propagated.
Eventual consistency : Data converges to a consistent state over time.
Choosing the appropriate model depends on business needs—for payments strong consistency is required, while product catalog updates may tolerate weaker guarantees.
Sharding (分库分表)
Sharding splits data across multiple databases to alleviate single‑node bottlenecks. Two main approaches:
Vertical sharding : Separate tables with low coupling into different databases (e.g., user profile vs. order history).
Horizontal sharding : Partition rows by a key range or hash (e.g., userId 1‑10000 in DB1, 10001‑20000 in DB2).
Vertical sharding improves modularity and reduces I/O contention but may require cross‑service joins, increasing complexity. Horizontal sharding keeps table size manageable and enables linear scaling, yet hot‑spot rows can become performance bottlenecks.
Transaction Challenges After Sharding
When a transaction spans multiple shards, it becomes a distributed transaction. Coordination across nodes introduces latency, potential partial failures, and the risk of deadlocks, making strong consistency harder to achieve.
CAP and BASE Theories
The CAP theorem states that in the presence of a network partition, a system must choose between consistency and availability. Modern large‑scale systems often favor availability and settle for BASE (Basically Available, Soft state, Eventually consistent) rather than strict ACID.
Two‑Phase Commit (2PC)
2PC ensures atomicity across nodes via a coordinator and participants:
Coordinator sends a prepare (vote) request to all participants.
Each participant executes the transaction locally, logs it, and replies with success or failure.
In the second phase:
If all participants voted success, the coordinator sends a commit command.
If any participant failed or timed out, the coordinator sends a rollback command.
Images illustrate the state transitions and message flow.
Drawbacks of 2PC include a single point of failure at the coordinator, blocking behavior while waiting for votes, and possible data inconsistency if commit messages are lost.
Timeout and Mutual‑Inquiry Mechanisms
To mitigate blocking, coordinators can abort after a timeout, and participants can query each other (mutual‑inquiry) to infer the global decision when the coordinator is unreachable.
Three‑Phase Commit (3PC)
3PC adds a pre‑commit (can‑commit) phase and timeout handling to reduce blocking:
Pre‑inquiry : Coordinator asks participants if they can commit.
Pre‑commit : If all agree, participants lock resources but do not commit.
Commit : Coordinator issues the final commit.
If any participant rejects or a timeout occurs, the coordinator aborts, avoiding the indefinite wait seen in 2PC. However, 3PC incurs higher latency and is less widely adopted.
TCC (Try‑Confirm‑Cancel) Pattern
TCC splits a transaction into three explicit steps:
Try : Reserve resources and perform business checks without committing.
Confirm : Finalize the reservation, making changes permanent.
Cancel : Release reserved resources if the transaction cannot be completed.
The workflow mirrors 2PC but operates at the service layer, with each step being a local transaction that can be committed independently.
Images show the TCC flow and its similarity to 2PC.
Conclusion
Distributed transaction management balances consistency, availability, and performance. Selecting the right consistency model, sharding strategy, and commit protocol (2PC, 3PC, or TCC) depends on workload characteristics, latency tolerance, and failure scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
