Mastering Distributed Systems: CAP, BASE, Locks, Transactions, Paxos & Raft Explained
This article provides a comprehensive overview of core distributed‑system concepts—including the CAP theorem, BASE model, common distributed‑lock implementations, multiple distributed‑transaction patterns such as 2PC, 3PC, TCC, local‑message tables, MQ transactions and Seata, as well as consistency algorithms like Paxos and Raft, idempotency techniques, and rate‑limiting algorithms—explaining their motivations, trade‑offs, and practical usage.
CAP Principle
The CAP theorem states that a distributed system can guarantee at most two of the following three properties:
Consistency : all replicas see the same data at the same logical time (strict consistency).
Availability : every request receives a non‑error response, but the response may not reflect the latest write.
Partition tolerance : the system continues to operate despite network partitions.
Because network partitions are inevitable, a system must always tolerate them, which forces a trade‑off between consistency and availability.
Why CAP cannot be achieved simultaneously
When a partition occurs, the system must either sacrifice consistency (allow divergent data) to remain available, or sacrifice availability (reject requests) to keep data consistent. The two goals are mutually exclusive under partition tolerance.
CAP‑derived models and typical applications
CA (Consistency + Availability, without Partition tolerance) : theoretical; in practice partitions cannot be ignored.
CP (Consistency + Partition tolerance, without Availability) : used by many distributed databases and transactional systems that require strong consistency.
AP (Availability + Partition tolerance, without Consistency) : adopted by many NoSQL stores, web caches, DNS, and registration services (e.g., ZooKeeper is CP, Eureka is AP, Nacos supports both).
BASE Theory
BASE (Basically Available, Soft state, Eventual consistency) relaxes the consistency requirement of CAP to achieve higher availability. It accepts temporary inconsistency (soft state) and guarantees that replicas will eventually converge.
Basically Available : the system remains operational under failures, though latency or functionality may degrade.
Soft state : data may be in an intermediate state; the system tolerates temporary inconsistency.
Eventual consistency : after a bounded period (determined by network latency, load, replication strategy) all replicas converge to the same state.
Distributed Locks
In a distributed environment a lock must be coordinated across nodes.
Common implementations
MySQL lock : create a lock table with a unique constraint; insert a row to acquire the lock and delete it to release. Simple but incurs DB I/O and does not scale under high concurrency.
ZooKeeper lock : create an ordered sequential node under a lock directory; the client that holds the smallest sequence number owns the lock. Other clients watch the predecessor node and acquire the lock when it disappears.
Redis lock : use the single‑threaded nature of Redis. The basic command is SETNX resourceName value. To avoid dead locks, add an expiration atomically (Redis 2.8+ supports SET resourceName value EX 5 NX). Libraries such as Redisson provide a higher‑level API and implement the RedLock algorithm.
Distributed Transactions
A distributed transaction extends ACID guarantees across multiple databases or services.
Key requirements
Record every action performed on each participating node.
All actions must either commit together or roll back together.
Typical patterns
Two‑Phase Commit (2PC) : a coordinator asks participants to prepare, then either commits or rolls back. Guarantees strong consistency but suffers from a single point of failure and blocking.
Three‑Phase Commit (3PC) : adds a “CanCommit” phase to avoid coordinator blocking; still vulnerable to network delays.
TCC (Try‑Confirm‑Cancel) : business‑level two‑phase commit where each operation provides explicit try, confirm, and cancel methods. Reduces DB‑level lock time but increases development complexity.
Local message table : split a global transaction into a local DB transaction that writes a message record; a background job polls the table and publishes to a message queue.
MQ transaction : send a “prepare” message, execute the local transaction, then commit or roll back the message based on the outcome. RocketMQ’s half‑message feature is an example.
Seata : an open‑source framework that adds a Transaction Coordinator (TC), Transaction Manager (TM), and Resource Manager (RM) to orchestrate global transactions with minimal intrusion.
Seata workflow
Service A’s TM requests a global transaction from TC; TC returns a unique XID.
Service A’s RM registers a branch transaction with TC under the XID.
Service A executes its branch.
During a remote call the XID propagates to Service B.
Service B’s RM registers its branch with TC.
Service B executes its branch.
After business logic finishes, TM decides to commit or roll back based on errors.
TC coordinates all branches to either commit or roll back.
Consistency Algorithms
Paxos
Paxos is a message‑based consensus algorithm that ensures distributed consistency. Roles include Proposer, Acceptor, and Learner (a node may play multiple roles).
Prepare phase
Proposer sends a prepare request with proposal number Mn to a majority of acceptors.
Each acceptor replies with the highest numbered proposal it has accepted (if any) and promises not to accept proposals numbered ≤ Mn.
Accept phase
If the proposer receives a majority of promises, it sends an accept request [Mn, Vn] where Vn is the value of the highest prior proposal (or any value if none).
Acceptors that have not promised a higher number accept the proposal.
When a majority of acceptors have accepted, the value is learned by all learners.
Multi‑Paxos
Multi‑Paxos elects a single leader to act as the sole proposer for subsequent rounds, eliminating conflicts among multiple proposers.
Raft
Raft is an easier‑to‑understand consensus algorithm with three roles: Leader, Follower, Candidate.
Leader election
Followers start election timers; if no heartbeat is received, a follower becomes a candidate, increments its term, votes for itself, and sends RequestVote RPCs.
If a candidate receives votes from a majority, it becomes the leader and starts sending periodic heartbeats.
If a candidate discovers a higher‑term leader or times out, it starts a new election.
Log replication
The leader receives client commands, appends them to its log, and replicates the entries to followers. Once a majority acknowledges, the entry is committed and applied to the state machine.
Idempotency
Idempotency means that repeating the same request yields the same result, preventing duplicate data caused by retries, rapid clicks, or duplicate messages.
Common techniques
Check‑then‑insert using a unique request identifier.
Unique database constraints and handling duplicate‑key exceptions.
Pessimistic locking (e.g., SELECT ... FOR UPDATE) to serialize updates.
Optimistic locking with a version or timestamp column.
Dedicated de‑duplication tables for one‑time identifiers.
State‑machine enforcement (e.g., order status flow).
Distributed locks (Redis + Redisson) for critical sections.
Token‑based approaches where a one‑time token authorizes the operation.
Rate Limiting
Common algorithms to control request rates:
Counter : fixed‑window counting; simple but suffers from burst spikes.
Leaky bucket : processes requests at a constant outflow rate, buffering excess up to a capacity.
Token bucket : generates tokens at a steady rate; each request consumes a token, allowing bursts up to the bucket size. Implementations include Guava’s RateLimiter for single‑instance services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
