CAP Theory, Shared‑Nothing, Load Balancing & High Availability Explained
This article explores core distributed system design principles, detailing the CAP theorem and its implications, the BASE extension, shared‑nothing architecture, various load‑balancing algorithms and deployment modes, as well as high‑availability strategies such as active‑standby, active‑active, and clustering to eliminate single points of failure.
1. System Architecture Design Theories and Principles
This section introduces common architecture design theories and principles used in medium to large interconnected systems.
CAP Theory
What is CAP? The CAP theorem, proposed by Brewer, states that a distributed system can only simultaneously guarantee two of the following three properties: Consistency, Availability, and Partition Tolerance.
Consistency means that after a successful update, all nodes see the same data at the same time. This differs from traditional RDBMS transaction consistency, which follows ACID (Atomicity, Consistency, Isolation, Durability). ACID emphasizes strong consistency at high cost and performance impact.
Availability ensures that reads and writes always succeed within a bounded time, providing continuous service to users.
Partition Tolerance means the system continues to operate despite network partitions or node failures.
Proof of CAP: In a scenario where two node groups G1 and G2 are isolated, a write to G1 cannot be read from G2, demonstrating that both consistency and availability cannot be satisfied simultaneously under partition.
CAP implies that system designers must choose between consistency and availability when scaling horizontally.
BASE Extension – Basically Available, Soft state, Eventually consistent. It relaxes CAP’s strict consistency to achieve higher availability, allowing temporary inconsistency that eventually converges.
Shared‑Nothing Architecture
Shared‑Nothing Architecture (SNA) is a distributed computing model where each node is independent, with no shared memory or disk, eliminating single points of contention and enabling strong scalability.
Key practice: avoid using session state in distributed systems, as synchronizing sessions across nodes degrades performance.
Comparison of parallel system models: shared‑memory, shared‑disk, and shared‑nothing, with shared‑nothing offering clear advantages for parallel user access.
Sharding strategies include functional sharding, key‑value sharding, and lookup‑table based sharding, each with trade‑offs.
Current SNA deployments include technologies such as MapReduce, BigTable, Cassandra, and MongoDB.
ED‑SOA Architecture
Event‑Driven Service‑Oriented Architecture (ED‑SOA) combines SOA’s modular component exposure with event‑driven mechanisms to achieve loose coupling and improved scalability.
2. Load Balancing
Load balancing distributes incoming requests across multiple backend services to increase throughput, availability, and flexibility.
Typical scenarios include web clusters and MapReduce workloads.
Common algorithms:
Round Robin / Weighted Round Robin – simple, stateless scheduling.
Hash – hash a request attribute (e.g., source IP) and mod by number of servers.
Consistent Hash – used for distributed caches like memcached or Redis.
Least Connection / Request – selects the server with the fewest active connections.
Most Idle – based on CPU, memory, bandwidth metrics.
Fastest Response – chooses server with lowest recent response time.
Least Traffic – selects server handling the smallest amount of traffic.
Load balancing modes include external (RR‑DNS), application‑layer (forward/reverse proxy), and network‑layer (IP translation, LVS, hardware appliances).
3. High‑Availability System Design
System availability is measured as MTTF/(MTTF+MTTR) × 100%.
Common HA patterns:
Active‑Standby – primary handles traffic; standby takes over on failure.
Active‑Active – multiple nodes serve traffic simultaneously and monitor each other.
Cluster – many identical nodes provide transparent service, coordinated by a control node or software such as Zookeeper.
Key design principle: eliminate single points of failure. This involves addressing weak links in networking, storage, and other components, and applying techniques like caching, queuing, sharding, load balancing, and geographic disaster recovery.
Even with robust design, external factors such as physical damage, mis‑configurations, or DDoS attacks can still cause outages.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
