Why Distributed Systems Matter: Core Concepts, Design Trade‑offs & CAP
This article explores the fundamentals of distributed systems, explaining what they are, why they’re used, design considerations such as replication and partitioning, the implications of the CAP theorem, common distribution strategies, typical architectural patterns, and the advantages and challenges of building and operating such systems.
1. What is a Distributed System?
A distributed system is a collection of computers that work together so that, to the end user, it appears as a single computer.
These computers share state, run concurrently, and the failure of an individual machine does not affect the overall system.
For example, a traditional database runs on one machine; a distributed database spreads data across three machines so that a record inserted on machine 1 can be read from machines 2 or 3.
2. Why Use Distributed Systems?
Managing distributed systems is complex and fraught with pitfalls, yet they provide horizontal scalability.
Vertical scaling (upgrading hardware) has limits; once the best hardware is exhausted, performance cannot increase further.
Horizontal scaling adds more machines, offering virtually unlimited capacity and better cost control.
Distributed systems also improve fault tolerance and reduce latency by placing nodes closer to users.
3. Distributed System Design Evolution
When traffic exceeds a single server’s capacity, we must scale the application.
One common approach is master‑slave replication: two replica databases synchronize with the master, handling read traffic while writes propagate asynchronously.
This improves read performance but introduces a consistency window where replicas may temporarily diverge.
To scale writes, partitioning (sharding) distributes data based on a key, such as username ranges, directing writes to specific nodes.
However, uneven key distribution can create hotspots (e.g., many usernames starting with “c”), requiring further splitting and increasing complexity.
While partitioning can multiply write throughput by the number of shards, it makes non‑shard‑key queries (e.g., joins) inefficient.
4. What Is the CAP Theorem?
The CAP theorem states that a distributed system cannot simultaneously guarantee Consistency, Availability, and Partition tolerance.
Consistency : All nodes see the same data at the same time.
Availability : Every non‑failed node responds to requests.
Partition tolerance : The system continues operating despite network partitions.
In practice, designers must choose between strong consistency and high availability when a partition occurs.
Most applications favor availability because network latency makes strong consistency costly; they adopt weaker consistency models (BASE: Basically Available, Soft state, Eventual consistency).
Examples: Cassandra emphasizes availability; HBase, Redis, and Zookeeper prioritize consistency.
5. How Is Data Distributed?
Common distribution methods include:
Hashing : Map values (IP, URL, ID) to nodes via a hash function; can lead to uneven distribution.
Range partitioning : Assign contiguous key ranges to nodes, allowing easy rebalancing.
Chunking by size : Split data into fixed‑size blocks and distribute them.
Replication : Store copies of data on multiple nodes for fault tolerance.
Consistent hashing : Use a hash ring so that adding a node only affects neighboring nodes.
Examples of real systems:
GFS, HDFS – size‑based distribution.
MapReduce – locality based on GFS.
BigTable, HBase – range‑based.
Pnuts – hash or range.
Dynamo, Cassandra – consistent hashing.
Mola, Armor, BigPipe – hash‑based.
Doris – hybrid hash and size.
6. Typical Distributed Architecture Types
Client‑Server : One server provides shared resources to multiple clients (e.g., printer services, Git).
Three‑tier : Separate presentation, logic, and data layers.
Multi‑tier : Extension of three‑tier with finer business‑level layering.
Peer‑to‑Peer : All nodes act as both client and server (e.g., BitTorrent, blockchain).
Database‑centric : Nodes coordinate via a shared database without direct inter‑node communication.
7. Advantages and Disadvantages of Distributed Systems
Advantages
Nodes can easily share data.
Scalable by adding more nodes.
Failure of a single node does not bring down the whole system.
Hardware resources are pooled across nodes.
Disadvantages
Security is harder to ensure across many nodes and connections.
Message loss can occur during transmission.
Complexity of managing distributed databases.
Network overload is possible when many nodes send data simultaneously.
Conclusion
Distributed systems are built from clusters of nodes; a cluster can be part of a larger distributed system, and multiple clusters can interconnect, forming a hierarchy of distributed architectures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Huawei Cloud Developer Alliance
The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
