Designing a Complete Distributed Server Cluster Architecture for Large Websites
This article outlines a comprehensive distributed server cluster architecture for large‑scale websites, covering the evolution from simple three‑tier setups to high‑availability load‑balancing with HAProxy/Keepalived, Redis caching, NoSQL storage, and future distributed MySQL considerations.
Research on a Complete Distributed Server Cluster Architecture
0x01. Evolution of Large‑Scale Websites
In simple terms, distribution shortens the execution time of individual tasks to improve efficiency, while clustering increases the number of tasks processed per unit time.
Clusters are mainly divided into High‑Availability Clusters, Load‑Balancing Clusters (which can be implemented with Nginx), and High‑Performance Computing Clusters.
Distribution means placing different services in different locations; clustering means gathering several servers together to run the same service. Each node in a distributed system can form a cluster, but a cluster is not necessarily distributed.
Reference: a blog on the evolution of large‑site architectures (http://www.cnblogs.com/leefreeman/p/3993449.html).
Typical large‑site architectures evolve from a simple three‑tier model (application, database, file service) to distributed services and clustered deployments.
Application, Database, File Service Architecture
Later stages introduce distributed services and clustered setups.
Distributed Server Cluster
0x02. Load‑Balancing Solutions
Previously we discussed Nginx reverse‑proxy load balancing; here we present a HAProxy + Keepalived dual‑node high‑availability solution.
HAProxy is a free, high‑speed, reliable load‑balancer and proxy for TCP and HTTP, especially suited for high‑traffic sites requiring persistent connections or layer‑7 processing.
In the architecture, if any of HAProxy, Keepalived, or the upstream HTTP server fails, the remaining components continue to operate.
Key advantages of HAProxy:
Supports virtual hosts and works on layers 4 and 7 (multi‑subnet).
Compensates for some Nginx shortcomings such as session persistence and cookie handling.
Provides URL‑based health checks for backend servers.
Generally offers higher load‑balancing performance than Nginx under heavy concurrency.
Can balance MySQL read traffic and monitor MySQL nodes.
HAProxy + Keepalived Load‑Balancing Scheme
0x03. Redis Caching Solution
Caching can be divided into server‑side caching and application‑level caching.
Application‑level caching is already handled in the Jue backend framework.
Server‑side caching reduces file I/O between the web server and PHP, easing load on both the load‑balancer and application servers.
While Memcached is a classic server cache, Redis has become the lightweight default due to its rich data structures (lists, sets, sorted sets, etc.) and support for pipelined commands.
Redis Cache Scheme
Reference: "High‑Availability Open‑Source Redis Cache Cluster Solution".
0x04. Sphinx Search Engine Solution
(First phase not implemented; will be considered later.)
Sphinx, developed in Russia, claims to handle tens of millions of records with up to 10 MB/s throughput.
It works as a MySQL‑compatible full‑text engine, building indexes using B‑tree and hash structures.
Configuration revolves around a sphinx.conf file that defines indexed fields and query behavior.
0x05. NoSQL Quick‑Storage Solution
NoSQL is used here for storing numerous small pieces of data (e.g., per‑page CSS values) to reduce MySQL SELECT load and improve speed.
Among many options, MongoDB is chosen for its simplicity.
0x06. Distributed MySQL Solution
Distributed MySQL has not been attempted yet; the initial phase will not include it due to unclear load requirements.
Reference: "Five Open‑Source Compatible Solutions Beyond Standard MySQL".
0x07. Overall Distributed Cluster Scheme
Summarizing the above, the following model represents a preliminary distributed architecture that will be continuously refined.
A Website Architecture Diagram
Source: http://homeway.me/2014/12/10/think-about-distributed-clusters/
·END·
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.