How Tencent’s Ops Teams Move Massive Workloads to the Cloud and Boost Efficiency
Tencent’s recent Operations Open Day showcased how its engineers migrated billions of users to public cloud, leveraged cloud‑native DevOps, serverless functions, and intelligent data‑center management to dramatically improve efficiency, scalability, and reliability across its massive infrastructure.
Do you think 996 is the ultimate for internet workers? There is a mysterious role that stays on call 24/7, year‑round – operations.
As a company serving billions of users, Tencent runs Asia’s largest network, server clusters, and data centers, providing cloud billing and security services. Operations are embedded in every layer to ensure continuous system uptime and product stability.
To honor operations professionals, Tencent Cloud, Tencent Technology Engineering, and CODING co‑hosted the first Tencent Operations Open Day in Shenzhen. Four experts from Tencent and CODING shared their cloud‑era operational experience with over 500 enthusiasts.
Zhou Xiaojun: From Internal Components to Cloud‑Native, “Moving the Elephant to the Cloud”
In September 2018, Tencent launched a major technical transformation, establishing a technology committee, open‑source collaboration, and pushing self‑developed services to public cloud. Zhou Xiaojun, head of the self‑developed cloud migration project, described the migration as moving an elephant onto the cloud.
The migration from private to public cloud follows five stages: planning, solution, verification, migration, and operation . The biggest challenges are adapting business to cloud‑native architectures and moving massive data.
QQ’s migration illustrates the effort: in 2017 all QQ users were on private cloud; by June 2019, 100 million QQ users were on public cloud, with a goal to migrate all users in three regions by the end of 2019. Benefits include higher R&D efficiency, better resource utilization, standardized cloud services for engineers, and the ability to export internal tools to the industry.
Zhang Hailong: Cloud‑Native DevOps Drives 200% Efficiency Growth
CODING founder Zhang Hailong emphasized that cloud‑native transforms team organization and work efficiency. Beyond moving servers to the cloud, architectural changes are needed to fully leverage scalability, monitoring, database, and caching capabilities offered by the cloud.
By extensively using PaaS and SaaS services and replacing manual operations with tools, release frequency can reach dozens of times per day, enabling rapid market response and continuous high‑quality delivery. Customers of CODING and Tencent Cloud have seen at least a 200% efficiency increase.
Zhang Yuanzhe: Tencent Cloud Function (SCF) Powers High‑Performance, Low‑Cost Mini‑Programs
Tencent Cloud launched its Function‑as‑a‑Service (FaaS) product SCF in 2017. Product manager Zhang Yuanzhe highlighted three advantages of serverless: reducing server clusters and operational complexity, shortening delivery cycles, and allowing developers to focus on business logic while operations focus on maintenance.
Using the Tencent Photo Album mini‑program as an example, the serverless approach enabled development in four weeks for a product supporting tens of millions of users, compared to an optimistic eight‑week estimate under traditional IaaS. By December 2018, the program had surpassed 100 million cumulative users and 12 million monthly active users.
Serverless also improves operations through fine‑grained management, higher system stability, and serving as a guarantee for business continuity.
Yue Shang: Human‑Machine Collaboration and Intelligent Data‑Center Operations
Data centers are the backbone of cloud computing, and Tencent’s rapid cloud growth brings new operational challenges. Industry classifies data‑center operations into three stages: S1 manual, S2 semi‑automatic, and S3 human‑machine collaborative, fully automated and intelligent . Tencent is transitioning from S2 to S3.
Intelligent operations standardize and digitize processes, collect massive operational data, and apply data mining, analysis, and insight to build a data‑driven, scenario‑based, platform‑centric smart data‑center.
Challenges arise from hardware diversity, inconsistent protocol standards, and fragmented data across locations. Tencent addresses these through four breakthroughs:
Leading industry standards for data‑center monitoring metrics and northbound interfaces.
Developing automated acceptance tools to verify vendor data accuracy.
Building a dedicated control network to unify disparate regional networks.
Deploying an intelligent control platform that aggregates data from all data centers for unified analysis.
To date, Tencent manages over 80 IDC sites, more than 1 million servers, with over 600 k measurement points and 237 TB of stored data, growing at about 40% annually, while continuously innovating to handle this rapid expansion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
