Fundamentals 53 min read

Fundamentals of Distributed Systems: Models, Replicas, Consistency, and Protocols

This article introduces core concepts of distributed systems, covering node models, replica types, consistency levels, data distribution strategies, lease and quorum mechanisms, logging techniques, two‑phase commit, MVCC, Paxos, and the CAP theorem, providing a comprehensive overview for engineers and researchers.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Fundamentals of Distributed Systems: Models, Replicas, Consistency, and Protocols

1. Conceptual Model

In engineering projects a node often corresponds to an OS process; the model treats a node as an indivisible whole, allowing a process composed of independent parts to be split into multiple nodes.

Abnormalities

Machine crashes: common, ~0.1% daily probability, recovery ~24 h with manual restart.

Network anomalies: message loss, partition, reordering, data errors, unreliable TCP.

Distributed three‑state: RPC results can be success, failure, or timeout.

Storage data loss: state loss for stateful nodes.

Exception‑handling principle: anticipate all design‑time exceptions, but runtime exceptions may be unforeseen; handle all possible exceptions unless metrics allow omission.

1.2 Replicas

A replica provides redundancy for data or services. Data replicas store identical data on different nodes, enabling recovery after loss. Service replicas provide the same service without relying on local storage.

Replica Consistency

Strong consistency: every read sees the latest successful update.

Monotonic consistency: a client never reads older data after seeing newer data.

Session consistency: within a session, reads are monotonic; version numbers can enforce this.

Eventual consistency: all replicas converge eventually, but intermediate reads may be stale.

Weak consistency: updates may be visible at different times on different replicas.

2 Distributed System Principles

2.1 Data Distribution

Data can be partitioned across machines using hash‑based, range‑based, or size‑based schemes. Hash distribution suffers from poor scalability and data skew; range and size distribution improve load balancing. Consistent hashing adds virtual nodes to reduce data movement during scaling.

2.2 Basic Replica Protocols

Two categories exist: centralized (e.g., primary‑secondary) and decentralized. Centralized protocols use a single coordinator for update ordering and concurrency control, but availability depends on the coordinator. Primary‑secondary workflows include update coordination, replication, and failover handling.

2.3 Lease Mechanism

Leases grant time‑bounded rights; the server promises not to modify data during the lease. Clients cache data while the lease is valid. Lease expiration enables safe updates and provides strong fault tolerance, assuming synchronized clocks.

2.4 Quorum Mechanism

Writes succeed after being persisted on W out of N replicas; reads query R replicas with R > N‑W. This ensures that any read overlaps with a successful write, providing configurable consistency‑availability trade‑offs.

2.5 Logging Techniques

Redo logs record post‑update state for crash recovery; checkpoints periodically dump in‑memory state to reduce replay time. 0/1 directory structures achieve atomic batch updates by toggling a master record.

2.6 Two‑Phase Commit

A coordinator asks participants to prepare, then decides to commit or abort based on votes. The protocol guarantees atomicity but suffers from poor fault tolerance and performance.

2.7 MVCC

Multi‑Version Concurrency Control keeps multiple data versions, allowing reads of consistent snapshots while writes create new versions.

2.8 Paxos

Paxos achieves strong consistency with a quorum of acceptors. Proposers issue numbered proposals; acceptors promise not to accept lower numbers and eventually agree on a single value.

2.9 CAP Theorem

CAP states that a system cannot simultaneously provide strong consistency, high availability, and partition tolerance. Different protocols (Lease, Quorum, Two‑Phase Commit, Paxos) make different trade‑offs among these properties.

distributed systemsCAP theoremprotocolsReplicationconsistencyQuorumlease
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.