Fundamentals 54 min read

Fundamentals of Distributed Systems: Models, Replication, Consistency, and Protocols

This article introduces core concepts of distributed systems, including node and replica models, various consistency levels, data distribution strategies, lease and quorum mechanisms, replica control protocols such as primary‑secondary, two‑phase commit, MVCC, Paxos, and the CAP theorem, providing a comprehensive overview for architects.

Java Architect Essentials

Oct 18, 2021

Fundamentals of Distributed Systems: Models, Replication, Consistency, and Protocols

1. Concepts

In a distributed system a node is typically an OS process; the model treats a node as an indivisible whole, allowing a process to be split into multiple nodes if needed.

Typical exceptions include machine crashes, network failures (message loss, reordering, unreliable TCP), the three‑state result of an RPC (success, failure, timeout), and storage data loss. The golden rule for exception handling is that any scenario considered during design will occur in production, and many unforeseen exceptions will also appear.

Replica (copy) provides redundancy for data or services. Data replicas persist the same data on different nodes to survive loss; service replicas provide identical functionality without relying on local storage.

Replica Consistency

Consistency models range from strong consistency (every read sees the latest write) to monotonic, session, eventual, and weak consistency, each offering different trade‑offs between freshness and availability.

Metrics for Distributed Systems

Key indicators are performance (throughput, latency, concurrency), availability (uptime vs. downtime), scalability (linear performance growth with added nodes), and consistency (how uniformly replicas reflect updates).

2. Distributed System Principles

Data Distribution Methods

Data can be partitioned by hash (simple modulo mapping, suffers from poor scalability and data skew), by range (splitting the key space into intervals, often managed by metadata servers), by size/chunk (fixed‑size blocks independent of key values), or by consistent hashing (nodes placed on a ring, optionally using virtual nodes to improve balance and scalability).

Choosing a distribution strategy often involves combining methods to mitigate their individual drawbacks, such as adding chunk‑based rebalancing to a hash‑based layout to address data skew.

Replica Control Protocols

Centralized protocols use a single coordinator (e.g., primary‑secondary) to order updates, perform concurrency control, and enforce consistency. The primary handles all writes, propagates updates to secondaries, and may employ a relay chain to avoid bandwidth bottlenecks.

Decentralized protocols treat all nodes as peers, achieving consensus without a single point of failure but at the cost of higher complexity and lower performance.

Primary‑Secondary Protocol

The primary receives external updates, orders them, and forwards them to secondaries; secondaries may be marked unavailable if they fail to sync.

Lease Mechanism

Leases grant a time‑bounded guarantee that a server will not modify a piece of data; clients cache data while the lease is valid, improving performance while preserving correctness even under network partitions or node crashes.

Quorum Mechanism

Writes must succeed on at least W out of N replicas; reads query at least R replicas with W+R > N to guarantee reading a committed value. Variants include write‑all‑read‑one (WARO) and more balanced configurations.

Log Techniques

Redo logs record the result of each update for crash recovery; checkpoints periodically dump in‑memory state to reduce replay time. The 0/1 directory approach achieves atomic batch updates by toggling a master record between two directory versions.

Two‑Phase Commit (2PC)

A classic centralized protocol with a coordinator and participants. Phase 1 (prepare) gathers votes; Phase 2 (commit/abort) finalizes the transaction. 2PC offers strong consistency but suffers from poor availability and performance.

Multi‑Version Concurrency Control (MVCC)

Each transaction creates a new data version; reads can select appropriate versions, enabling high concurrency while preserving a history of changes.

Paxos Consensus

Nodes act as proposers, acceptors, and learners. A value is chosen when a majority of acceptors accept it. Paxos provides strong consistency with good availability and partition tolerance, essentially a quorum‑based protocol.

CAP Theorem

It states that a distributed system can at most simultaneously provide two of the three guarantees: Consistency, Availability, and Partition tolerance. Different protocols make different trade‑offs: lease favors C + P, quorum balances all three, 2PC gives C only, while Paxos achieves C with reasonable A and P.

Overall, the article offers a comprehensive overview of distributed‑system fundamentals, guiding architects in selecting appropriate models, replication strategies, consistency guarantees, and consensus protocols.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems CAP theorem Replication Consistency Consensus

Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.