Fundamentals 53 min read

Fundamentals of Distributed Systems: Models, Replication, Consistency, and Core Protocols

This comprehensive article explains the core concepts of distributed systems—including node modeling, failure types, replica strategies, consistency levels, performance metrics, data distribution techniques, lease mechanisms, quorum, logging, two‑phase commit, MVCC, Paxos, and the CAP theorem—providing a solid foundation for designing robust, scalable architectures.

Top Architect
Top Architect
Top Architect
Fundamentals of Distributed Systems: Models, Replication, Consistency, and Core Protocols

The article begins by defining a node as an indivisible process in a distributed system and enumerates common failures such as machine crashes, network partitions, RPC three‑state outcomes, and data loss, emphasizing the need to anticipate all possible exceptions during design.

It then introduces replicas (data and service copies) and details replica consistency models: strong, monotonic, session, eventual, and weak consistency, explaining their trade‑offs for availability and correctness.

Key performance metrics—throughput, latency, concurrency, availability, scalability, and consistency—are outlined to guide system evaluation.

Various data distribution methods are described, including hash‑based partitioning, range‑based sharding, chunk‑based placement, and consistent hashing with virtual nodes, highlighting their impact on load balancing, data skew, and metadata management.

The article discusses replica control protocols, contrasting centralized approaches (primary‑secondary, leader‑based coordination) with decentralized consensus, and details the workflow of primary‑secondary updates, leader election, and failure handling.

A lease mechanism is presented as a lightweight way to guarantee cache validity and fault tolerance, illustrating lease‑based cache design, lease issuance, renewal, and expiration handling.

Quorum techniques are explained, showing how write‑and‑read quorum parameters (W, R) ensure that reads intersect with successful writes, and how different quorum settings affect consistency and availability.

Logging strategies—including redo logs, checkpoints, and 0/1 directory structures—are covered for crash recovery, with step‑by‑step recovery procedures.

The two‑phase commit protocol is detailed, describing coordinator and participant states, message flows, and recovery after crashes, while noting its performance and fault‑tolerance limitations.

Multi‑Version Concurrency Control (MVCC) is introduced as a versioned data model that enables concurrent reads and writes without locking.

Paxos consensus is explained, defining proposers, acceptors, and learners, and outlining the prepare and accept phases that achieve agreement despite failures.

Finally, the CAP theorem is presented, clarifying why no system can simultaneously achieve strong consistency, high availability, and partition tolerance, and positioning lease, quorum, two‑phase commit, and Paxos within the CAP trade‑off space.

Distributed SystemsCAP theoremReplicationconsistencyconsensusPaxos
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.