Backend Development 11 min read

How to Build a Scalable Million‑DAU Backend Architecture with Hybrid Cloud

This article outlines a comprehensive architecture for handling millions of daily active users, covering DNS routing, four‑layer and seven‑layer load balancing, monolithic versus microservice deployment, caching, database sharding, hybrid‑cloud strategies, full‑link elastic scaling, and a three‑tier degradation mechanism to ensure resilience under sudden traffic spikes.

ITFLY8 Architecture Home

Feb 26, 2022

How to Build a Scalable Million‑DAU Backend Architecture with Hybrid Cloud

1 General Million‑DAU User System Architecture Design

Recent incidents like the Xi'an “One Code” outage highlighted the need for scalable systems that can handle traffic spikes many times higher than normal and support automatic scaling.

1.1 DNS

DNS directs user requests to regional data centers based on IP, with caching to keep routing consistent.

1.2 Four‑Layer Load Balancing

After DNS, traffic reaches a layer‑4 load balancer (typically LVS) that forwards requests to appropriate layer‑7 gateway clusters.

1.3 Seven‑Layer Load Balancing (Gateway)

Layer‑7 gateways (e.g., Nginx) handle domain‑level routing, application‑level forwarding, authentication, logging, and monitoring.

1.4 Server Side

Two deployment models: monolithic applications for simple or small teams, and microservice decomposition for larger codebases and teams.

1.5 Caching

Cache frequently accessed data using memcached or Redis to reduce database latency.

1.6 Database

Implement master‑slave replication for read‑heavy workloads and sharding (by time or user ID) to handle massive data volumes.

2 Hybrid Cloud Architecture

Combine private and public cloud resources: private data centers handle normal traffic, while public cloud absorbs spikes beyond private bandwidth limits, requiring network interconnectivity and coordinated deployment of all layers.

3 Full‑Link Elastic Scaling

When traffic exceeds private capacity, shift part of the load to public cloud, scaling four‑layer and seven‑layer components, servers, caches, and databases.

3.1 Four‑Seven Layer Scaling

Public cloud SLB can handle millions of concurrent connections; Nginx instances can be elastically added based on QPS.

3.2 Server Scaling

Stateless services can be auto‑scaled based on weighted QPS and latency metrics; tools like CudgX use log data and machine learning for precise scaling.

3.3 Cache and Database Scaling

Cache and database layers must also support capacity expansion, with sufficient redundancy to handle rapid traffic growth.

4 Three‑Tier Degradation Mechanism

To protect a system handling tens of millions of DAU, a three‑level degradation strategy is employed, gradually sacrificing functionality to release resources while minimizing user impact.

Level 1: invisible to users, releases less than 30% redundancy. Level 2: visible to users, releases up to 50% redundancy. Level 3: severe degradation, releases 50‑100% redundancy, used only as a last resort.

Additional supporting mechanisms such as decision‑support systems and on‑call alerting are required to maintain reliability at this scale.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Load Balancing database sharding hybrid cloud Scalable Systems

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.