Cloud Computing 24 min read

Large-Scale Task Scheduling Architecture of Tencent Meeting and VStation

The talk explains how Tencent’s self‑developed VStation scheduler, integrated with TKE and using a hybrid sharding‑plus‑master‑worker architecture, enabled Tencent Meeting to scale to over 100 000 hosts and one million CPU cores, cutting provisioning time to under ten seconds while handling thousands of tasks per minute through DAG‑driven automation and fault‑tolerant mechanisms.

Tencent Cloud Developer

Apr 29, 2020

Large-Scale Task Scheduling Architecture of Tencent Meeting and VStation

This article summarizes a talk by Tencent Cloud expert engineer Li Dekai at the "Cloud+ Community Salon Online" about the architecture and challenges of the large‑scale task scheduling system behind Tencent Meeting.

During the COVID‑19 pandemic, Tencent Meeting became a critical remote collaboration tool, handling billions of user requests. In just eight days, the service expanded to over 100 000 cloud hosts and more than one million CPU cores, a record in Tencent Cloud history.

The presentation first introduces two typical cases: the rapid scaling of Tencent Meeting during the Spring Festival and the massive cash‑red‑packet activity of Kuaishou, both powered by Tencent Cloud's self‑developed large‑scale scheduling system VStation.

VStation integrates with Tencent Kubernetes Engine (TKE) and provides APIs for fast resource provisioning. It manages tens of thousands of physical nodes, supports heterogeneous resources (CPU, GPU, memory‑optimized instances), and can schedule up to 5 000 tasks per minute.

The scheduling workflow consists of four steps: resource preparation, rapid allocation, VM initialization, and service launch. Tasks are dispatched through modules such as dispatch, scheduler, image, network, and compute. The system uses a configuration‑driven approach to generate DAGs, allowing parallel execution of independent steps and reducing the provisioning time from over a minute to under ten seconds.

Challenges addressed include:

High concurrency causing throughput bottlenecks in unified scheduling architectures.

Scalability limits of single‑point designs.

Scheduling conflicts in shared‑state models.

To overcome these, VStation evolved from a unified scheduler to a two‑level scheduler and finally to a hybrid scheduling architecture that combines sharding, master‑worker coordination, and task factor merging. Similar‑factor tasks are aggregated, reducing the number of scheduling operations by an order of magnitude.

Additional optimizations involve snapshot‑based VM image delivery using Tencent Cloud CBS, enabling near‑instant VM startup, and robust fault‑tolerance through distributed locks and cold‑standby mechanisms.

The Q&A section covers VStation’s development history, design principles, use of message queues for full‑duplex communication, cold‑standby switching, task merging efficiency, and the static generation of DAGs for low overhead.

Overall, the talk demonstrates how careful architectural evolution, modular design, and configuration‑driven automation enable massive, reliable cloud service scaling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems high availability task scheduling Tencent Meeting large-scale expansion VStation

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.