How to Build a Scalable Million‑DAU Backend Architecture with Hybrid Cloud
This article outlines a comprehensive architecture for handling millions of daily active users, covering DNS routing, four‑layer and seven‑layer load balancing, monolithic versus microservice deployment, caching, database sharding, hybrid‑cloud strategies, full‑link elastic scaling, and a three‑tier degradation mechanism to ensure resilience under sudden traffic spikes.
1 General Million‑DAU User System Architecture Design
Recent incidents like the Xi'an “One Code” outage highlighted the need for scalable systems that can handle traffic spikes many times higher than normal and support automatic scaling.
1.1 DNS
DNS directs user requests to regional data centers based on IP, with caching to keep routing consistent.
1.2 Four‑Layer Load Balancing
After DNS, traffic reaches a layer‑4 load balancer (typically LVS) that forwards requests to appropriate layer‑7 gateway clusters.
1.3 Seven‑Layer Load Balancing (Gateway)
Layer‑7 gateways (e.g., Nginx) handle domain‑level routing, application‑level forwarding, authentication, logging, and monitoring.
1.4 Server Side
Two deployment models: monolithic applications for simple or small teams, and microservice decomposition for larger codebases and teams.
1.5 Caching
Cache frequently accessed data using memcached or Redis to reduce database latency.
1.6 Database
Implement master‑slave replication for read‑heavy workloads and sharding (by time or user ID) to handle massive data volumes.
2 Hybrid Cloud Architecture
Combine private and public cloud resources: private data centers handle normal traffic, while public cloud absorbs spikes beyond private bandwidth limits, requiring network interconnectivity and coordinated deployment of all layers.
3 Full‑Link Elastic Scaling
When traffic exceeds private capacity, shift part of the load to public cloud, scaling four‑layer and seven‑layer components, servers, caches, and databases.
3.1 Four‑Seven Layer Scaling
Public cloud SLB can handle millions of concurrent connections; Nginx instances can be elastically added based on QPS.
3.2 Server Scaling
Stateless services can be auto‑scaled based on weighted QPS and latency metrics; tools like CudgX use log data and machine learning for precise scaling.
3.3 Cache and Database Scaling
Cache and database layers must also support capacity expansion, with sufficient redundancy to handle rapid traffic growth.
4 Three‑Tier Degradation Mechanism
To protect a system handling tens of millions of DAU, a three‑level degradation strategy is employed, gradually sacrificing functionality to release resources while minimizing user impact.
Level 1: invisible to users, releases less than 30% redundancy. Level 2: visible to users, releases up to 50% redundancy. Level 3: severe degradation, releases 50‑100% redundancy, used only as a last resort.
Additional supporting mechanisms such as decision‑support systems and on‑call alerting are required to maintain reliability at this scale.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
