Operations 10 min read

How Tencent’s ZhiYun Platform Powered the “Military Photo” Campaign with 4,000 Servers

This article details how Tencent's SNG operations team leveraged the ZhiYun intelligent operations platform—through standardized processes, massive IaaS provisioning, CMDB management, automated workflows, and real‑time capacity monitoring—to support the high‑traffic “Military Photo” H5 campaign, scaling up to 4,000 servers and 24 GB bandwidth.

Efficient Ops
Efficient Ops
Efficient Ops
How Tencent’s ZhiYun Platform Powered the “Military Photo” Campaign with 4,000 Servers

Introduction

In the recent "Military Photo" campaign jointly run by People's Daily and Tencent Cloud, the SNG social platform operations team handled a surge of 4,000 devices, a peak bandwidth of 24 GB, and performed five automatic scaling operations.

ZhiYun Intelligent Operations Platform

1. Standardized Operations

ZhiYun manages over 100,000 servers and thousands of services 24/7 with a very small operations staff. Standardized service packages, centralized configuration, unified routing, and component frameworks enable rapid, error‑free scaling of thousands of servers during events.

2. Powerful IaaS Supply

Built on Tencent Cloud's massive resources, ZhiYun can provide second‑level IaaS provisioning and automatic scaling to meet ten‑thousand‑level service demands instantly.

3. CMDB Application Configuration

The CMDB treats each module (a cluster providing a single function) as a management node, recording hardware, software, operational settings, packages, configuration files, scripts, workflows, and test cases.

Illustration of the CMDB configuration for the "Daily P图" service.

4. Automated Process Introduction

ZhiYun promotes a workflow: Standardization → Configuration → Automation, turning routine operations into repeatable tools without relying on fragile documentation or individual expertise.

During the "Military Photo" event, any operator only needed to trigger the "Daily P图" scaling function; ZhiYun automatically performed the full deployment and launch process.

5. Key Technical Points

1. ZhiYun Routing: L5

Name Service Abstracts IP and port into a name service ID, making the caller unaware of the actual server address.

Load Balancing Assigns weights based on server capacity, automatically balancing traffic.

Request Scheduling Detects faulty machines, removes them from service, and reintegrates them after recovery; can shift traffic across data centers during large‑scale failures.

2. High‑Concurrency Transfer

Two core techniques enable fast file distribution:

Asynchronous Message‑Queue Execution Engine Commands and scaling tasks are processed via an asynchronous, message‑driven architecture, supporting high concurrency and horizontal scaling.

Distributed Multi‑Level File Distribution System Files are stored with triple redundancy in a distributed file system, cached per region for near‑site transfer, ensuring reliability and speed.

3. Activity Platform: Automatic Scaling Down

The platform supports timed and low‑load scaling policies, automatically shrinking resources after the event without manual intervention.

Capacity Monitoring Methods

1. Routine Load Management

Capacity management is treated as a planned activity rather than fire‑fighting. Metrics are collected per module:

Low Load : CPU < 30%, traffic < 100 Mb/s, request density < 200 req/s/GB

High Load : CPU > 75%, traffic > 300 Mb/s, request density > 600 req/s/GB

2. Handling Abnormal Capacity

Capacity objects include single machines, modules, and SETs.

Single‑Machine Management Uses CPU affinity and multi‑queue NIC features to balance load across cores.

Module Management

Leverages L5 request‑weight scheduling to balance IP load.

Ensures consistent application and configuration deployment via ZhiYun's consistency management.

SET Management Performance testing identifies bottlenecks, maintaining a reliable capacity model for critical scheduling.

3. Real‑Time Module Capacity Monitoring

When IP capacity is uniform within a module, ZhiYun collects hardware metrics from each host, aggregates them, and provides real‑time capacity indicators for automated decision‑making.

Conclusion

With the support of Tencent Cloud and the ZhiYun platform, the SNG operations team delivered robust, behind‑the‑scenes support for the "Military Photo" campaign, showcasing the power of modern, automated operations.

monitoringcloud computingautomationoperationsScalingCMDBIaS
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.