Operations 17 min read

How Tencent Keeps Massive Online Games Running Smoothly Amid DDoS Attacks

Tencent's game‑operation team shares a comprehensive strategy that combines data‑center resilience, multi‑layer DDoS defense, dynamic traffic routing, predictive hardware maintenance, and cloud‑scale scaling to ensure continuous, profitable service for millions of online players.

Efficient Ops
Efficient Ops
Efficient Ops
How Tencent Keeps Massive Online Games Running Smoothly Amid DDoS Attacks

On July 26, 2016, Tencent Zhiying and Tencent Cloud co‑hosted the 4th Game Operations Technology Forum in Shanghai, where Wu Jianjian presented "Network Operation for Massive Game Services".

1. Stability and Profitability

Two core principles guide Tencent's operation: “keep the service online for one more minute” and “increase profit each minute”. These drive data mining, marketing, and overall game operation, which involve data centers, networks, servers, applications, carriers, terminals, and users. Understanding the architecture and mechanisms of each layer, as well as details such as data‑center disaster recovery, fiber, routing adjustments, isolation, exit scheduling, and platform‑level protection, is essential for building and operating the system.

2. Ongoing Thinking in Game Operation

The team continuously examines current pain points and contradictions, anticipates future challenges, and considers both macro‑level pressures and micro‑level fine‑grained actions. A thinking model is presented to guide strategic adjustments.

3. DDoS Attack Experience

DDoS attacks are an endemic problem, more about ecosystem competition than pure technology.

Three aspects of DDoS to watch Difficulty tracing attacks leads to growing volume and scale. High protection requirements make it hard for small companies to keep up. Carriers are improving traffic governance, but progress is slow.

Our defense is divided into three layers:

Peak traffic (exceeding bandwidth) – cooperate with carriers to block the flood.

Massive traffic within Tencent's cleaning center – use gateway‑level security to protect the IDC.

Clustered cleaning on top of traffic cleaning.

We trigger a DDoS "trigger" that signals carriers to reflect traffic to provincial backbone black‑hole routes, preventing flood from reaching Tencent IDC.

Our cooperation with carriers Trigger creation : Tencent generates a DDoS trigger and establishes protocol links with carriers. When a large‑scale attack is detected, the trigger activates devices that push routing updates to carriers. Carriers reflect the traffic back to their provincial backbone. Through a routing reflector , the traffic is diverted to a provincial black‑hole, keeping it away from Tencent's IDC.

This approach requires minimal cleaning; the attack is filtered at the carrier level, so it never reaches Tencent's internal network.

Key cooperation notes: use network‑equipment interfaces (not application APIs) when integrating with carriers, control the frequency and volume of IP reports, and maintain close coordination to avoid accidental network outages.

4. Global "Sword" (Cleaning Centers)

Tencent has deployed cleaning centers ("swords") at the exits of all 25 carrier networks nationwide. Any DDoS attack on a data center is redirected to a black‑hole route at the carrier edge, protecting the peak traffic.

Monitoring whether an IP remains under attack after black‑hole activation is challenging; currently it relies on periodic release and re‑checking. Future plans involve leveraging carrier‑wide big data to detect attack start and end times, integrating with Tencent Cloud's auto‑scaling to ensure bandwidth supply, and using BlueKing automation for rapid response.

5. Flexible Export Traffic Switching

Tencent's TIX (public exchange platform) flattens IDC resource provisioning, allowing multiple carriers to connect. Advantages include price competition, reduced single‑carrier dependence, flexible traffic scheduling, centralized DDoS defense, and cross‑campus export sharing.

When monitoring detects that traffic from Guangzhou to Carrier A is being routed through Shenzhen while Shanghai offers better network quality, routing can be adjusted to shift outbound traffic to Shanghai and update inbound routing accordingly, using route‑level policies to balance load.

Global route collection and centralized computation (via large data clusters) determine optimal routes, which are then distributed to local controllers using I2RS devices, enabling minute‑level adjustments across the network.

Since 2015, Tencent has avoided 137 large‑scale network adjustments, saving over 8,000 minutes of potential disruption.

6. Detail + Implementation = Future

Future work includes more granular parameter inputs, cost‑sensitive routing, and business‑specific constraints to achieve optimal strategies through centralized computation and route‑level control.

Proxy games present additional challenges due to opaque protocols and unpredictable traffic spikes, requiring careful monitoring and isolation.

7. Burst! Burst!

Micro‑bursts in network traffic can cause subtle performance degradation without obvious packet loss. For example, a matchmaking server may experience a sudden 140 Mb/s spike that hits the 1 Gbps NIC limit within 0.01 s.

By monitoring buffer utilization (triggering alerts when >80%), Tencent can proactively detect and mitigate such bursts.

8. Prediction

Predictive maintenance for disks uses big‑data analysis of SMART metrics, failure fingerprints, and historical event sequences to forecast failures with up to 80 % accuracy. Models achieve 81 % precision for failures within 20 days, balancing detection window and false‑positive rates.

Validation with manufacturers confirms >90 % accuracy, and metrics such as FP, TP, and FN are continuously refined.

Online repair techniques (e.g., sector rewriting via BlueKing) reduce manual replacement time, though they have limited retry counts.

9. Joint Expectations

The 2016 GOPS Global Operations Conference in Shanghai will feature Tencent's frontline operations experts sharing these practices and future directions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.