Designing a Scalable Architecture for Million‑Level DAU User Systems
The article outlines a comprehensive backend architecture for handling million‑level daily active users, covering DNS routing, L4/L7 load balancing, monolithic versus microservice deployment, caching, database sharding, hybrid‑cloud deployment, elastic scaling, and multi‑level degradation strategies to ensure high availability under sudden traffic spikes.
Recent incidents such as the Xi'an One‑Code‑Pass outage highlighted the importance of designing systems with sufficient scalability and automatic scaling capabilities to handle traffic spikes that exceed normal daily loads.
The article first presents a generic architecture for an internet application serving a million‑level DAU, describing the typical request flow through several layers.
1. DNS – DNS resolves user IPs to the appropriate regional IDC, leveraging caching on the client side to keep requests consistent.
2. L4 Load Balancing – The first hop after DNS is a layer‑4 load balancer (commonly LVS) that forwards traffic to the appropriate layer‑7 gateway cluster based on domain names.
3. L7 Load Balancing (Gateway) – Implemented with Nginx, the gateway handles application‑level routing, authentication, logging, and monitoring, and distributes traffic among numerous services.
4. Application Layer – Two deployment models are discussed: a monolithic application suitable for small teams and simple business logic, and a micro‑service architecture that splits domains into independent services to improve development efficiency for larger teams.
5. Caching – Frequently accessed data is cached using systems such as Memcached or Redis, providing sub‑millisecond latency and supporting hundreds of thousands of QPS per node.
6. Database – To achieve high availability, the article recommends master‑slave replication for read‑write separation, and horizontal sharding (both time‑based and user‑ID‑based) to keep individual tables under ten million rows.
While this design comfortably supports million‑level DAU, handling tens of millions or higher requires automatic scaling and rapid degradation mechanisms across all layers.
2. Hybrid Cloud Architecture – Combines private‑cloud IDC capacity with public‑cloud resources to overcome the fixed bandwidth limitation of private data centers. Traffic is primarily served by the private cloud, with overflow redirected to the public cloud via a hybrid deployment.
The hybrid solution relies on a platform such as BridgX to abstract differences between private and public clouds and to orchestrate resources through Kubernetes.
3. Full‑Link Elastic Scaling – When traffic exceeds on‑premise capacity, portions of the load are shifted to the public cloud, requiring elastic scaling of L4/L7, services, cache, and databases.
Elastic scaling strategies include provisioning additional SLB instances for L4, dynamically scaling Nginx instances for L7, and using tools like CudgX to measure service pressure and trigger automatic scaling based on weighted QPS and latency.
Cache and database scaling also need to consider warm‑up time; cache expansion may take days, while database scaling (e.g., MySQL sharding) can take hours due to data synchronization.
4. Three‑Level Degradation Mechanism – To protect the system under extreme load, a tiered degradation approach is proposed: Level 1 (invisible to users, <30% capacity release), Level 2 (user‑visible, <50% release), and Level 3 (significant user impact, 50‑100% release). Additional supporting mechanisms such as decision‑support systems and on‑call alerting are also required.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
