Fundamentals of Distributed Systems: Concepts, Replication, Consistency, and Core Protocols
This article provides a comprehensive overview of distributed system fundamentals, covering system models, replicas, performance and availability metrics, data distribution strategies, replica protocols such as lease, quorum, two‑phase commit, MVCC, Paxos, and the CAP theorem, with practical engineering considerations.
1 Concepts
In engineering projects a node is usually a process on an operating system; the model treats a node as an indivisible whole, even if a process consists of several relatively independent parts.
1.1 Model
Nodes, replicas, and metrics for measuring distributed systems are introduced.
1.2 Replicas
A replica provides redundancy for data or services. Data replicas store the same data on different nodes to survive node failures, while service replicas provide identical services without relying on local storage.
Replica Consistency
Consistency levels include strong, monotonic, session, eventual, and weak consistency, each with different guarantees for read/write ordering and availability.
1.3 Metrics for Distributed Systems
Performance: throughput, latency, concurrency (QPS).
Availability: ratio of uptime to downtime or success‑failure ratio of operations.
Scalability: ability to increase performance linearly with added machines.
Consistency: impact of replica mechanisms on system simplicity.
2 Distributed System Principles
2.1 Data Distribution Methods
Three common approaches are hash‑based distribution, range‑based distribution, and chunk‑based distribution, each with trade‑offs in scalability, data skew, and metadata management.
Consistent Hashing
Uses a ring of hash values and virtual nodes to achieve better load balancing and easier scaling.
Replica and Data Distribution
Decoupling replicas from physical machines by using data segments improves recovery speed, fault tolerance, and scaling.
Locality‑Aware Computation
Placing computation on the same machine as its data reduces network overhead.
2.2 Basic Replica Protocols
Replica control protocols are classified as centralized (e.g., primary‑secondary) or decentralized. Centralized protocols rely on a coordinator node for update ordering and concurrency control, while decentralized protocols achieve consensus through peer negotiation.
Primary‑Secondary Protocol
Describes the update flow, read paths for different consistency levels, primary election, and handling of unavailable replicas.
2.3 Lease Mechanism
Leases grant time‑bounded rights to cache data; the server guarantees no updates during the lease period, enabling fault‑tolerant caching and simplifying recovery.
2.4 Quorum Mechanism
Defines write (W) and read (R) thresholds such that W+R > N, ensuring that reads intersect with successful writes. Discusses WARO, quorum‑based primary selection, and limitations for determining the latest committed version.
2.5 Log Techniques
Redo logs and checkpointing are used for crash recovery; 0/1 directory structures provide atomic batch updates.
2.6 Two‑Phase Commit
Explains the coordinator‑participant workflow, failure handling, and why the protocol suffers from poor fault tolerance and performance.
2.7 MVCC
Multi‑Version Concurrency Control creates separate versions for each transaction, allowing concurrent reads and writes while preserving consistency.
2.8 Paxos Protocol
Describes proposer, acceptor, and learner roles, the prepare and accept phases, and how a majority (N/2+1) forms a quorum to achieve strong consistency.
2.9 CAP Theorem
States that a distributed system cannot simultaneously provide strong consistency, high availability, and partition tolerance; various protocols (lease, quorum, two‑phase commit, Paxos) make different trade‑offs among these properties.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.