Fundamentals 53 min read

Fundamentals of Distributed Systems: Concepts, Replication, Consistency, and Core Protocols

This article provides a comprehensive overview of distributed system fundamentals, covering system models, replicas, performance and availability metrics, data distribution strategies, replica protocols such as lease, quorum, two‑phase commit, MVCC, Paxos, and the CAP theorem, with practical engineering considerations.

Architects' Tech Alliance

Jul 18, 2021

Fundamentals of Distributed Systems: Concepts, Replication, Consistency, and Core Protocols

1 Concepts

In engineering projects a node is usually a process on an operating system; the model treats a node as an indivisible whole, even if a process consists of several relatively independent parts.

1.1 Model

Nodes, replicas, and metrics for measuring distributed systems are introduced.

1.2 Replicas

A replica provides redundancy for data or services. Data replicas store the same data on different nodes to survive node failures, while service replicas provide identical services without relying on local storage.

Replica Consistency

Consistency levels include strong, monotonic, session, eventual, and weak consistency, each with different guarantees for read/write ordering and availability.

1.3 Metrics for Distributed Systems

Performance: throughput, latency, concurrency (QPS).

Availability: ratio of uptime to downtime or success‑failure ratio of operations.

Scalability: ability to increase performance linearly with added machines.

Consistency: impact of replica mechanisms on system simplicity.

2 Distributed System Principles

2.1 Data Distribution Methods

Three common approaches are hash‑based distribution, range‑based distribution, and chunk‑based distribution, each with trade‑offs in scalability, data skew, and metadata management.

Consistent Hashing

Uses a ring of hash values and virtual nodes to achieve better load balancing and easier scaling.

Replica and Data Distribution

Decoupling replicas from physical machines by using data segments improves recovery speed, fault tolerance, and scaling.

Locality‑Aware Computation

Placing computation on the same machine as its data reduces network overhead.

2.2 Basic Replica Protocols

Replica control protocols are classified as centralized (e.g., primary‑secondary) or decentralized. Centralized protocols rely on a coordinator node for update ordering and concurrency control, while decentralized protocols achieve consensus through peer negotiation.

Primary‑Secondary Protocol

Describes the update flow, read paths for different consistency levels, primary election, and handling of unavailable replicas.

2.3 Lease Mechanism

Leases grant time‑bounded rights to cache data; the server guarantees no updates during the lease period, enabling fault‑tolerant caching and simplifying recovery.

2.4 Quorum Mechanism

Defines write (W) and read (R) thresholds such that W+R > N, ensuring that reads intersect with successful writes. Discusses WARO, quorum‑based primary selection, and limitations for determining the latest committed version.

2.5 Log Techniques

Redo logs and checkpointing are used for crash recovery; 0/1 directory structures provide atomic batch updates.

2.6 Two‑Phase Commit

Explains the coordinator‑participant workflow, failure handling, and why the protocol suffers from poor fault tolerance and performance.

2.7 MVCC

Multi‑Version Concurrency Control creates separate versions for each transaction, allowing concurrent reads and writes while preserving consistency.

2.8 Paxos Protocol

Describes proposer, acceptor, and learner roles, the prepare and accept phases, and how a majority (N/2+1) forms a quorum to achieve strong consistency.

2.9 CAP Theorem

States that a distributed system cannot simultaneously provide strong consistency, high availability, and partition tolerance; various protocols (lease, quorum, two‑phase commit, Paxos) make different trade‑offs among these properties.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CAP theorem replication consistency Paxos two-phase commit Lease

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.