Distributed Systems Essentials: Models, Replicas, Consistency & Protocols
This comprehensive guide explores the core concepts of distributed systems, covering node models, replica strategies, consistency levels, data distribution techniques, lease-based caching, quorum mechanisms, logging, two‑phase commit, MVCC, Paxos consensus, and the CAP theorem, providing practical insights for building robust scalable architectures.
Distributed Systems Essentials
1. Concepts
1.1 Model
A node in practice is often a process on an operating system; the model treats a node as an indivisible whole, and a process can be split into multiple nodes if needed.
1.2 Replica
Replica (copy) provides redundancy for data or services. Data replicas store the same data on different nodes, enabling recovery when a node fails. Service replicas provide identical services without relying on local storage.
Replica consistency ensures that reads from different replicas return the same data under defined constraints.
Strong consistency : every read sees the most recent successful update.
Monotonic consistency : a client never reads older data after seeing newer data.
Session consistency : within a session, reads are monotonic.
Eventual consistency : updates eventually propagate to all replicas.
Weak consistency : no guarantee on read freshness.
1.3 Metrics
Key metrics include performance (throughput, latency, concurrency), availability (ratio of uptime to downtime), scalability (linear performance growth with added nodes), and consistency (strength of replica guarantees).
2. Distributed System Principles
2.1 Data Distribution
Data can be distributed by hash, range, or chunk size. Hash distribution suffers from poor scalability and data skew; range distribution partitions data by key ranges; chunk‑based distribution splits data into equal‑sized blocks, simplifying rebalancing.
Consistent hashing improves scalability by using a virtual ring and virtual nodes, allowing smooth node addition/removal with minimal data movement.
2.2 Basic Replica Protocols
Replica control protocols are either centralized (a single coordinator manages updates and consistency) or decentralized (all nodes negotiate equally). Centralized protocols simplify concurrency control but create a single point of failure; decentralized protocols avoid this but are more complex.
Primary‑Secondary Protocol
One replica is designated primary; all writes go through it, which orders updates and propagates them to secondaries. Reads can be served by any replica for eventual consistency, but only the primary guarantees strong consistency.
Primary election and failover rely on metadata servers to track the current primary; detection latency (≈10 s) determines the downtime during failover.
2.3 Lease Mechanism
A lease grants a client exclusive read‑only access to data for a fixed period. While the lease is valid, the server promises not to modify the data, allowing safe caching. After lease expiry, the client must discard cached data. Lease‑based caches improve performance while providing strong fault tolerance, even under network partitions or server crashes.
Lease validity requires loosely synchronized clocks; typical lease durations are on the order of 10 seconds.
2.4 Quorum Mechanism
Quorum writes require a write to succeed on at least W out of N replicas; reads query at least R replicas, with W + R > N ensuring that a read overlaps a successful write.
Special cases include Write‑All‑Read‑One (WARO) where W = N and R = 1. Quorum balances availability and consistency, and can be combined with primary‑secondary selection to read the latest committed version efficiently.
2.5 Logging Techniques
Redo logs record the result of each update before applying it to memory, enabling crash recovery by replaying the log. Checkpoints periodically dump the entire in‑memory state to disk, reducing replay time.
The 0/1 directory technique maintains two metadata directories; updates are written to the inactive directory and then atomically switched, providing atomic batch updates.
2.6 Two‑Phase Commit (2PC)
2PC is a centralized strong‑consistency protocol with a coordinator and participants. Phase 1 (prepare) gathers votes; if all vote commit, Phase 2 sends a global‑commit; otherwise a global‑abort is issued. Recovery relies on persistent logs to resume or finalize the transaction after crashes.
2PC suffers from poor fault tolerance and performance due to blocking behavior and multiple message rounds.
2.7 Multi‑Version Concurrency Control (MVCC)
MVCC creates a new data version for each transaction, allowing reads of consistent snapshots while writes proceed concurrently. Conflicts are detected when committing; non‑conflicting transactions succeed.
2.8 Paxos Consensus
Paxos achieves strong consistency in a decentralized setting. Roles include proposers (suggest values), acceptors (vote, requiring a majority), and learners (observe the chosen value). Each round has a monotonically increasing number; a value once chosen cannot be changed.
The protocol consists of a prepare phase (proposers request promises) and an accept phase (proposers propose values). Learners read a quorum of acceptors to learn the committed value.
2.9 CAP Theorem
CAP states that a distributed system cannot simultaneously provide strong consistency, high availability, and partition tolerance. Real‑world systems trade off these properties: lease mechanisms favor consistency and partition tolerance, quorum offers a balanced trade‑off, two‑phase commit emphasizes consistency at the cost of availability, and Paxos provides strong consistency with reasonable availability and partition tolerance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
