Build a Scalable, High‑Availability Web Architecture with HAProxy, MySQL Replication & CDN
Learn how to construct a robust web infrastructure by deploying HAProxy as a reverse proxy, implementing high‑availability with Keepalived, balancing loads across DNS, setting up dynamic and static servers, configuring MySQL master‑slave replication, integrating caching layers, and leveraging CDN, monitoring, and automation tools for optimal performance.
1.1 HTTP Reverse Proxy Server
In front of a web site we need a reverse proxy server to accept user requests for both dynamic and static content; common solutions are HAProxy and Nginx, with HAProxy used here.
1.2 High‑Availability HTTP Proxy
To improve system security and availability, the front‑end HTTP reverse proxy should be configured for high availability, e.g., HAProxy combined with Keepalived.
1.3 HTTP Proxy Load Balancing
With two HAProxy nodes, typically only one serves users while the other is idle, wasting resources; DNS can be configured with two A records to distribute requests to both nodes, enabling load balancing.
1.4 Dynamic Content Server
When deploying dynamic content (e.g., a LAMP stack), high availability and load balancing are also required; dynamic pages such as index.jsp or index.php should be served by the dynamic server.
Note: Accessing a site’s homepage triggers multiple resource requests (static assets like images, CSS, etc.) after the initial HTML is retrieved.
1.5 Database Node Server
Dynamic pages that query data require a database node; common databases include MySQL, MariaDB, Oracle.
Relational data stored in MySQL‑type databases
File data stored in the file system
Key‑value data stored in cache servers or NoSQL databases
1.6 MySQL Master‑Slave Architecture
When query volume exceeds a single MySQL server’s capacity, a master‑slave setup with one master and multiple slaves improves query performance.
1.7 Cache Server
Because MySQL’s built‑in cache cannot be shared across multiple application servers, an external cache (e.g., Memcached) is added; if only one MySQL read node exists, MySQL’s local cache may suffice.
When adding cache to a MySQL master‑slave architecture, a “bypass” cache mode is used, caching MySQL query results.
1.8 Cache Working Modes
Two cache modes exist:
Proxy mode (e.g., HAProxy, Nginx)
Bypass mode (e.g., Memcached)
1.8.1 Proxy Mode
When a cache server receives a request and lacks a local record, it forwards the request to the backend, caches the response, and returns it to the client.
1.8.2 Bypass Mode
The client first queries the cache server; if a record exists, the cache server responds immediately. If not, the client contacts the backend server, which may still return data; the client then decides whether to store the result in the cache.
1.9 MySQL Read‑Write Splitting
With a master‑slave setup, read and write requests must be directed to appropriate nodes; two approaches are:
Configure the front‑end application to send writes to the master and reads to slaves (requires code changes for scaling).
Deploy a read‑write splitting server (e.g., Amoeba) that proxies requests and forwards them to the correct node.
1.10 Caching with Multiple Slave Nodes
When multiple read nodes exist, load balancing among them can reduce cache‑hit rates; two strategies are:
Simple modulo hashing of queries to consistently route the same query to the same node (fails if a node goes down).
Consistent hashing with virtual nodes, which limits impact of node failures and improves balance.
1.11 Master Node Single‑Point Bottleneck
The write master becomes a single point of failure; deploying a dual‑master solution (e.g., MySQL‑MMM, MySQL+Proxy, DRBD) provides high availability.
1.12 File Upload Storage
Non‑structured data such as user‑uploaded files should be stored in a file system (e.g., NAS) accessed via HTTP PUT requests and file‑service APIs.
1.13 User Read Requests
Separate dynamic and static servers using HAProxy; static content can be served by Nginx with high availability via Nginx + Keepalived.
1.14 Static Content Caching
Static files benefit from caching layers like Varnish or Squid, which can respond to HAProxy; if Varnish misses, it queries Nginx, which may have its own local cache.
1.15 CDN Technology
For large‑scale sites, a CDN distributes cached servers globally and uses global load balancing (GSLB) to route users to the nearest cache, reducing origin load and improving latency; multiple CDN providers are recommended for redundancy.
1.16 Monitoring, Automation, and Backup
Deploy monitoring agents (e.g., Nagios, Zabbix) to track performance and service health; use automation tools (e.g., Ansible, Puppet, Cobbler) for mass deployment and updates; implement backup solutions (e.g., Bacula) to protect data against loss.
1.17 Summary
This article outlines the step‑by‑step expansion of a basic web site into a full‑featured, highly available architecture, covering reverse proxy, load balancing, dynamic/static separation, database replication, caching strategies, CDN integration, monitoring, automation, and backup.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
