Rapid, Low‑Cost Business Surge Handling for Tencent Cloud CDN: Architecture, Challenges, Solutions, and Results
This article analyzes how Tencent Cloud CDN addresses the massive, unpredictable traffic spikes of popular mobile games like "Honor of Kings" by building a Tb‑level burst‑pool using Docker virtualization, enabling automated 10‑minute scaling, reducing costs, and maintaining high service quality.
Abstract
"Honor of Kings" is China’s most popular mobile game with hundreds of millions of users and tens of millions of daily active users. The article discusses how to guarantee rapid, low‑cost handling of massive traffic bursts, presents the challenges, the proposed solutions, and summarizes the results.
Background
With a user base of billions and frequent updates, the game generates frequent traffic spikes that require CDN support. Similar burst scenarios appear in news videos, live streams, popular TV series, and other games, often reaching Tb‑level bandwidth. Since 2007 Tencent has built its own CDN, scaling from tens of Gb to tens of Tb, and opened the service to external customers in 2014, accumulating extensive experience in low‑cost burst handling.
1. Challenges and Issues
1.1 Business Characteristics and Challenges
Traffic bursts are large (most exceed Tb, some reach 10 Tb), diverse (video on demand, live sports, game downloads, static web promotions), and unpredictable.
These characteristics demand more resources, varied resource types, and fast scaling capabilities.
1.2 Existing Problems
Simply provisioning large amounts of resources for bursts is costly and wasteful. Direct resource reuse faces two problems:
Only partial resources can be reused because different services have distinct resource needs (e.g., video buffers vs. download buffers), leading to long preparation times.
Cost cannot be reduced because some bursts (e.g., game downloads) peak at specific times, causing higher bandwidth settlement if only native resources are used.
2. Solution
2.1 Burst‑Pool System Architecture
The burst‑pool is a Docker‑based virtual machine pool placed above physical machines. It shares CPU, memory, and disk resources across all platforms while keeping existing services on physical machines unchanged.
Key components:
Burst‑pool: Docker VMs with resource limits to protect host machines.
Automated deployment and monitoring: Predicts demand and expands capacity within 10 minutes; distributes hot files for video/download services to reduce origin bandwidth.
Scheduling system: Direct‑traffic ("直通车") scheduling offers faster, minute‑level activation compared to domain‑level scheduling.
Monitoring agents report load every minute; when current bandwidth exceeds 50 % of the predicted value, the system automatically expands the burst‑pool. Operators can also pre‑specify bandwidth for planned events.
2.2 Technical Optimizations
To ensure virtualization does not affect existing services, several isolation and control mechanisms are applied:
Quota system : Limits CPU, I/O, and bandwidth per VM; combined with monitoring data to keep host load within defined thresholds.
302 redirects : When a VM exceeds its quota, it returns a 302 redirect to the direct‑traffic scheduler, allowing precise load control.
Network card flow control : In extreme overload, the virtual NIC drops packets to protect the host.
Disk size limitation : Uses loop devices to mount directories, indirectly limiting disk usage for containers that run on ext3/ext4.
CPU binding : Collects per‑CPU load every minute, averages over 15 minutes, and binds VMs to less‑loaded cores via cpuset.cpus to minimize impact on the host.
Effects
After launching the burst‑pool, Tencent Cloud CDN efficiently supported large‑scale events such as "Honor of Kings" downloads, NBA live streams, and KPL/LPL game broadcasts, saving approximately 20 million CNY in costs and significantly improving burst capacity.
Conclusion
Tencent Cloud CDN leverages Docker virtualization to build a Tb‑level burst‑pool that supports diverse services (live, on‑demand, static) and automatically scales within 10 minutes. The approach offers fast deployment, low cost, and high resource utilization, while requiring careful real‑time monitoring and scheduling to avoid cross‑service interference. Future work includes container‑level kernel tuning and support for clients that cannot handle 302 redirects.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
