Scaling Web Systems to 100M Visits: Load Balancing, Caching, and DB Tactics
This article explores how a web system can evolve from handling 100,000 daily visits to over 100 million by implementing multi‑layered load‑balancing strategies, optimizing MySQL with indexing and connection pooling, leveraging Redis and cache clusters, and employing geographic distribution and disaster‑recovery techniques to ensure performance and reliability.
Web Load Balancing
Load balancing distributes work across a server cluster and is essential for protecting backend web servers as traffic grows.
1. HTTP Redirection
Servers return a 302 response with a new URL, causing the browser to request the redirected address. This method is simple but adds latency and performs poorly under massive traffic.
2. Reverse Proxy (Layer‑7)
Software such as Nginx forwards HTTP requests, acting as a bridge between browsers and backend servers. It allows custom routing rules and weight distribution but can suffer from session affinity issues and single‑point failures.
3. IP Load Balancing (Layer‑4)
Operating at the network and transport layers, IP load balancers (e.g., LVS‑NAT, LVS‑RD, LVS‑TUN) modify packet headers for high performance. Configuration is more complex than Nginx.
4. DNS Load Balancing
Multiple IPs are assigned to a single domain name, allowing DNS to distribute traffic. It is simple and fast but cannot define custom rules and suffers from DNS propagation delays.
5. DNS/GSLB (Global Server Load Balancing)
CDN‑style GSLB returns IPs based on geographic proximity, reducing routing hops. It provides high performance but requires complex configuration and higher maintenance costs.
Web System Cache Mechanism
Beyond external load balancing, internal caching is crucial because roughly 80% of requests target 20% of hot data.
MySQL Internal Caching
1. Proper Indexing – Accelerates data retrieval for large tables but consumes disk space and adds overhead to write operations.
2. Connection Thread Pool – Configure thread_cache_size to reuse threads, reducing connection creation overhead. Long‑lived connections (pconnect) can exhaust max_connections and thus require a connection‑pool service (e.g., Swoole).
3. InnoDB Buffer Pool – innodb_buffer_pool_size should be ~80% of physical memory on a dedicated MySQL server to cache indexes and data, improving hit rates.
4. Sharding / Partitioning – When tables exceed millions of rows, split databases or tables to maintain performance, despite added complexity.
Multi‑Server MySQL Deployments
1. Master‑Slave Replication – Provides failover but leaves the slave idle most of the time.
2. Read‑Write Splitting – Direct writes to the master and reads to the slave, improving read scalability.
3. Master‑Master (Active‑Active) – Each node acts as both master and slave, eliminating single‑point failures for two‑node setups.
Data Synchronization Between MySQL Nodes
High traffic can cause replication lag. Solutions include MySQL’s built‑in multi‑threaded replication (limited to database‑level parallelism) and custom binlog parsers that synchronize at the table level.
Caching Between Web Servers and Databases
1. Page Staticization – Generate static HTML once and serve it directly, reducing dynamic processing.
2. Single‑Node Memory Cache – Deploy Redis or Memcached on a dedicated server; Redis offers richer features.
3. Memory Cache Cluster – Use Redis Cluster to avoid single‑point failures and scale cache capacity.
4. Reducing Database Writes – Batch write operations via a queue, adjust innodb_flush_log_at_trx_commit, or upgrade storage (RAID, SSD).
5. NoSQL Storage – Offload hot read/write data to key‑value stores like Redis, which can also persist to disk.
Empty‑Node Query Filtering
Cache lookups for non‑existent records waste resources. A mapping table stored in memory can filter such queries early.
Geographic Distribution (Distributed Deployment)
Core‑centralized, node‑dispersed architecture places critical services in a central data center while deploying stateless services across regional nodes (e.g., Shanghai core, Beijing/Shanghai/Shenzhen/Wuhan edge nodes).
Node Disaster Recovery and Overload Protection
When a node fails, traffic is rerouted to nearby nodes. Overload protection can reject new connections (e.g., login queue) or divert traffic to less‑loaded nodes.
Conclusion
Web systems evolve from a single server to massive clusters through layered solutions: load balancing, caching, database optimization, and geographic distribution. Each stage solves existing bottlenecks while introducing new challenges, making continuous optimization essential.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
