Fundamentals 55 min read

Mastering Distributed System Fundamentals: Models, Replication, Consistency, and Protocols

This article provides a comprehensive overview of distributed system fundamentals, covering node modeling, replica concepts, consistency levels, data distribution strategies, centralized and decentralized replica protocols, lease mechanisms, quorum, two‑phase commit, MVCC, Paxos, and the CAP theorem, while analyzing their trade‑offs in availability, consistency, and partition tolerance.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Mastering Distributed System Fundamentals: Models, Replication, Consistency, and Protocols

Concepts

Model

In engineering projects a node is typically a process running on an operating system. The model treats a node as an indivisible whole; if a process consists of several relatively independent components it can be split into multiple logical nodes.

Replica

A replica (copy) provides redundancy for data or services. Data replicas store identical copies of the same data on different nodes so that a failed node can recover the data from another replica. Service replicas provide the same service without relying on local storage; they obtain required data from other nodes. Replica control protocols are the theoretical core of distributed systems.

Consistency Levels

Strong consistency : every read sees the most recent successful write.

Monotonic consistency : once a client reads a value, it never reads an older value later.

Session consistency : within a single client session, reads are monotonic, but there is no guarantee across sessions.

Eventual consistency : all replicas converge to the same state eventually, without a bounded time guarantee.

Weak consistency : reads may return stale values; the system provides no ordering guarantees.

Distributed System Principles

Data Distribution Methods

Three common ways to partition data across a cluster are:

Hash‑based distribution: a hash function maps a key to a bucket; the bucket determines the target node. Simple but suffers from poor scalability when the cluster grows and from data skew.

Range‑based distribution: keys are divided into contiguous intervals; each interval is assigned to a node or a group of nodes. Allows easy range queries and balanced load through dynamic splitting of hot intervals.

Chunk‑based (size‑based) distribution: the data set is treated as a sequential file and split into fixed‑size chunks that are placed on different nodes. Independent of key characteristics, thus avoiding data skew.

Consistent hashing improves hash‑based distribution by placing nodes on a logical ring and assigning each key to the next clockwise node. Virtual nodes (multiple logical nodes per physical machine) further smooth load balancing and simplify node addition or removal.

Hash‑based data distribution diagram
Hash‑based data distribution diagram
Range‑based data distribution diagram
Range‑based data distribution diagram
Consistent hashing with virtual nodes
Consistent hashing with virtual nodes

Basic Replica Protocols

Replica control protocols can be classified as centralized or decentralized.

Centralized Replica Control (Primary‑Secondary)

The primary‑secondary (leader‑follower) protocol uses a single coordinator (the primary) to order updates and propagate them to secondaries.

Clients send update requests to the primary.

The primary performs concurrency control and determines the order of updates.

The primary propagates the update to all secondary nodes.

Each secondary acknowledges the update.

The primary replies to the client with success or failure based on the acknowledgments.

Reading from the primary yields strong consistency; reading from any replica provides eventual consistency. If the primary fails, a new primary must be elected, which may cause a service pause of several seconds.

Lease Mechanism

A lease grants a node exclusive rights to cache data for a bounded time interval. The server issues a lease together with the data; while the lease is valid the server will not modify the data. After lease expiry the client must discard the cached copy. This design provides high fault tolerance because lease validity does not depend on continuous communication.

Quorum Mechanism

Writes must be acknowledged by at least W replicas; reads must contact at least R replicas, with the constraint W + R > N (where N is the total number of replicas). This guarantees that a read overlaps with a successful write, allowing the client to see the latest committed value.

Two‑Phase Commit (2PC)

2PC is a classic centralized protocol that provides strong consistency but suffers from poor availability and performance. The coordinator logs begin_commit, sends prepare to participants, collects votes, and then broadcasts either global_commit or global_abort. Failure handling requires log‑based recovery on both coordinator and participants.

Multi‑Version Concurrency Control (MVCC)

Each transaction creates a new version of the data. Reads can select a suitable version, allowing concurrent reads and writes without locking. MVCC is widely used in databases and can be applied to distributed storage.

Paxos Consensus

Paxos achieves strong consistency in a decentralized setting. A proposer suggests a value; a majority of acceptors must promise to accept it; the value is then chosen and learned by learners. The protocol proceeds in numbered rounds to avoid deadlock.

CAP Theorem

The CAP theorem states that a distributed system cannot simultaneously provide strong consistency (C), high availability (A), and partition tolerance (P). Different protocols make different trade‑offs:

Lease sacrifices some availability for consistency and partition tolerance.

Quorum balances all three properties.

2PC offers full consistency but poor availability and partition tolerance.

Paxos provides strong consistency with reasonable availability and partition tolerance.

Understanding these building blocks helps architects design systems that meet specific requirements for latency, durability, and scalability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemsfault toleranceReplicationConsistencyconsensus protocolsdata distribution
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.