Fundamentals 54 min read

Fundamentals of Distributed Systems: Concepts, Replication, Consistency, and Protocols

This article provides a comprehensive overview of distributed system fundamentals, covering system models, node concepts, failure types, replication strategies, consistency levels, data distribution methods, replica control protocols such as primary‑secondary, two‑phase commit, lease‑based caching, quorum, MVCC, Paxos, and the CAP theorem.

Top Architect

Sep 26, 2021

Concepts

In a distributed system a node is typically a process on an operating system; the model treats each node as an indivisible whole. Common failures include machine crashes, network partitions, message loss, out‑of‑order delivery, and data loss, all of which must be anticipated in design.

Replication

Replication (or copy) provides redundancy for data or services. Data replicas store identical data on different nodes, while service replicas provide the same functionality without relying on local storage. Consistency models range from strong consistency (all reads see the latest write) to monotonic, session, eventual, and weak consistency, each offering different trade‑offs between correctness and performance.

Distributed System Principles

Key to scaling is how data is partitioned across nodes. Common strategies include hash‑based distribution, range‑based distribution, and chunk‑based (data‑volume) distribution. Consistent hashing improves scalability by minimizing data movement during node addition or removal, often using virtual nodes to balance load.

Replica and Data Distribution

Replica placement influences scalability and recovery speed. Storing replicas per machine is simple but limits scalability; segment‑based replication (splitting data into uniform segments) decouples replicas from physical machines, enabling faster recovery and balanced load during failures or scaling.

Replica Control Protocols

Two main families exist: centralized protocols, which rely on a single coordinator (e.g., primary‑secondary, two‑phase commit), and decentralized protocols, where all nodes are peers (e.g., Paxos). Centralized protocols are simpler but suffer from single‑point‑of‑failure and higher latency; decentralized protocols provide higher availability at the cost of complexity.

Primary‑Secondary Protocol

The primary node coordinates writes, enforces ordering, and propagates updates to secondaries. Failover requires electing a new primary, often using metadata servers to track the current primary.

Two‑Phase Commit

Involves a coordinator asking participants to prepare, collecting votes, and then committing or aborting globally. It guarantees atomicity but suffers from poor availability and performance, especially under failures.

Lease‑Based Cache

Clients receive a lease with an expiration time when caching metadata. The server guarantees no updates to the data during the lease, allowing safe caching. Lease expiration handles node or network failures gracefully.

Quorum Mechanism

Updates succeed after being written to W out of N replicas; reads succeed after reading from R replicas, with W+R > N ensuring that at least one replica overlaps, providing consistency guarantees while tolerating failures.

MVCC (Multi‑Version Concurrency Control)

Each transaction creates a new data version. Reads can select appropriate versions, enabling high concurrency without locking.

Paxos Protocol

Uses proposers, acceptors, and learners. A value is chosen when a majority of acceptors (N/2+1) accept it. Subsequent rounds can only choose the same value, providing strong consistency with fault tolerance.

CAP Theorem

States that a distributed system cannot simultaneously provide strong consistency, high availability, and partition tolerance. Different protocols make trade‑offs: lease mechanisms favor consistency and partition tolerance, quorum offers a balanced trade‑off, while two‑phase commit emphasizes consistency at the expense of availability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems protocols replication consistency quorum Lease

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.