How Tencent Scales Automated Operations with Package Management and CMDB
This article outlines Tencent's automated operations framework, covering the evolution of its package management system, multi‑center organizational structures, CMDB resource imaging, process automation, version control, and release management, while sharing practical lessons and pitfalls from real‑world deployments.
Yang LIDong Ten‑year operations veteran, currently responsible for backend operations of QQ data and relationship chain at Tencent ZhiYun. Experienced in large‑scale backend support for farming, advertising, and the development of operations service systems, architecture optimization, and automation.
1. Our Story
Rapid growth brings operational pressure: fast‑growing development teams, diverse frameworks, and varying coding habits create challenges for operations staff.
Hiring additional ops engineers leads to skill gaps, communication issues, and script management difficulties, especially when developers are busy with business features.
Even with popular solutions, adoption can be hindered by lack of developer cooperation.
Examples: massive photo upload spikes requiring thousands of devices in hours; QQ red‑packet activity supporting 800 modules across ~30,000 devices within two weeks.
2. The Evolution of Our Package Management System
Package management is introduced in the context of three organizational models:
Centralized: a single development leader enforces a unified framework.
Multi‑center: multiple departments develop different products but share similar environment requirements, leading to standard or efficient models.
Discrete: highly heterogeneous environments require tool‑based operations.
The optimization process is incremental; there is no one‑size‑fits‑all low‑cost solution.
Our delivery pipeline progresses from package management → SPP framework components → naming service & scheduling → CMDB resources, imaging, striping, automation → data bank & intelligent operations.
Key components: File Management Packages group related files (Admin, Conf, Log, Client) to provide a consistent structure, though real‑world complexity is higher. Process Management Packages include start/stop scripts with precise matching to avoid accidental kills; a self‑monitoring mechanism checks processes every three minutes and restarts them if needed, using a “flag” system to control auto‑start behavior. Version Management Frequent product iterations lead to hundreds of versions; we use diff‑based incremental downloads, storing only changed files, achieving fast upgrades and zero‑network rollback by using local diffs. Instance Management Instance views map each IP to its program and version; safeguards include command line shielding and reduction of operational objects to improve efficiency. Release Management We employ canary, gray, and blue‑green deployments, limiting each cluster to at most two concurrent versions and using release plans to coordinate large‑scale changes.
3. CMDB Resource Imaging and Automation
Complex services resemble micro‑service architectures without standard interfaces, requiring management of permissions, configs, devices, resources, naming, and more.
Traditional documentation is replaced by a CMDB imaging model that captures both hardware and business resources, ensuring consistency by reconciling online reports with configuration stores.
Automated scaling follows strict step sequences; high‑load triggers expansion, low‑load triggers contraction, with weight‑based decisions for bringing instances online or offline.
Change health checks generate reports and alerts, ensuring responsible parties are notified of anomalies.
Overall, the system integrates file, process, version, and release management into a unified automated operations platform, supporting large‑scale services, AI‑assisted monitoring, and efficient DevOps collaboration.
Note: This article is compiled from Yang LIDong’s talk at GOPS 2018 Shenzhen.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
