Operations 10 min read

How Tencent’s ZhiYun Platform Powers Massive Social Event Ops at Scale

This article explains how Tencent's SNG operations team leveraged the ZhiYun intelligent operations platform—through standardized processes, massive IaaS provisioning, CMDB management, automation workflows, and capacity monitoring—to flawlessly support the high‑traffic "military‑uniform photo" campaign across thousands of servers.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How Tencent’s ZhiYun Platform Powers Massive Social Event Ops at Scale

Preface

Recently, the People’s Daily and Tencent Cloud jointly ran the "Military‑Uniform Photo" campaign, which surged across users' timelines. Behind the marketing spectacle, the SNG operations team handled a massive operational load: 4,000 devices, a peak of 24 Gbps bandwidth, and five automatic scaling events.

ZhiYun Platform

1. Standardized Operations

The ZhiYun Intelligent Support Platform manages over 100,000 servers and thousands of services 24/7 with a very small operations staff. Standardized service frameworks—unified packaging, centralized configuration, unified routing, and shared components—enable multiple teams to collaborate efficiently, deliver quickly, and respond to sudden business demands.

2. Powerful IaaS Supply Foundation

Built on Tencent Cloud’s vast resources, ZhiYun can provide second‑level IaaS provisioning. Combined with automated scaling, it meets rapid, large‑scale service deployment needs.

3. CMDB Application Configuration

ZhiYun’s CMDB treats each module (a cluster providing a single function) as a management node, recording hardware, software, operational settings, packages, configuration files, scripts, workflows, and test cases required for automation.

The diagram illustrates the CMDB configuration for the "Everyday Photo" service.

4. Automated Process Introduction

ZhiYun promotes a pipeline: Standardization → Configuration → Automation, turning routine operations into repeatable tools without relying on fragile documentation or individual expertise.

Following continuous‑delivery principles, the team encapsulated operational steps into DIY workflow orchestration. For the "Military‑Uniform" event, any operator could trigger the "Everyday Photo" scaling function, and ZhiYun would automatically execute the full deployment and launch process.

5. Key Technical Points

1. ZhiYun Routing: L5

Name Service

Calls are abstracted by name service, so callers need only the service ID, making server IP changes transparent.

Load Balancing

Weighted routing based on server capacity ensures balanced load across heterogeneous nodes.

Request Scheduling

L5 detects failures, removes faulty machines from the pool, and reintegrates them after recovery, also supporting cross‑datacenter failover.

2. High‑Concurrency Transmission

Fast file distribution relies on two techniques:

Asynchronous, message‑queue‑driven execution engine Command channels and scaling workflows use asynchronous messaging for high concurrency and horizontal scalability.

Distributed multi‑level file distribution system Files are stored redundantly across a distributed file system with regional caches, enabling near‑site transfer and high reliability.

3. Activity Platform: Auto‑Scaling Down

Given the bursty nature of social campaigns, ZhiYun provides automatic scaling‑down based on time or low‑load triggers, allowing the "Military‑Uniform" event to shrink resources without manual intervention.

Capacity Monitoring Methods

1. Routine Low/High Load Management

Operations aim to reduce fire‑fighting by planning capacity work. Low load: CPU < 30 %, traffic < 100 Mb/s, access < 200 req/s/GB. High load: CPU > 75 %, traffic > 300 Mb/s, access > 600 req/s/GB.

2. Handling Abnormal Capacity

Capacity objects include single machines, modules, and SETs. For single machines, CPU affinity and multi‑queue NICs balance load. For modules, ZhiYun L5 routing adjusts request weights, and consistency management ensures uniform deployments. SET capacity is modeled via stress testing.

Real‑time module monitoring aggregates hardware metrics to feed automated decisions.

Conclusion

The SNG social platform operations team, supported by Tencent Cloud and the ZhiYun platform, delivered robust operational support for the People’s Daily "Military‑Uniform Photo" campaign, showcasing the strength of modern cloud‑native operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Tencentcapacity managementCMDB
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.