Cloud Native 13 min read

TLinux Team's Mixed Deployment Scheme for Improving Whole-Machine CPU Utilization

Tencent’s TLinux team introduced a kernel‑level mixed‑deployment framework that adds an offline scheduling class and load‑balancing algorithm, enabling online tasks to instantly pre‑empt offline work and boosting whole‑machine CPU utilization to as high as 90% while preserving latency‑sensitive service performance.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
TLinux Team's Mixed Deployment Scheme for Improving Whole-Machine CPU Utilization

TLinux team at Tencent proposes a brand‑new mixed‑deployment (混部) solution that significantly raises whole‑machine CPU utilization without affecting online services. In some scenarios the utilization can reach up to 90%.

Background : Tencent operates a massive fleet of servers whose CPU utilization is often low, especially for offline workloads. Improving CPU usage can effectively double the capacity of a machine and reduce operating costs.

1. Existing Mixed‑Deployment Schemes

Two main approaches are currently used:

Cpuset: isolates online and offline workloads on separate CPU cores. It works in some cases but limits multi‑threaded performance and does not achieve true mixing.

Cgroup: uses cgroup share and period/quota to limit CPU time for offline groups. It helps latency‑insensitive services but cannot guarantee that online tasks can pre‑empt offline ones when needed.

Both methods fail to solve the core problem: online workloads cannot timely pre‑empt offline workloads, which prevents mixed deployment in latency‑sensitive scenarios.

2. TLinux Team's Mixed‑Deployment Scheme

The new framework adds kernel‑level support for online‑offline mixing, including a dedicated offline scheduling class, load‑balancing optimizations, bandwidth limiting, and user‑space interfaces.

Problem 1 – Online Pre‑empting Offline : Existing CFS scheduling requires two conditions for pre‑emption (virtual time smaller and the victim has run longer than a minimum). TLinux introduces an offline scheduling class whose priority is lower than CFS but higher than idle, allowing CFS to pre‑empt offline tasks immediately.

Problem 2 – Efficient Use of Idle CPU by Offline Tasks : TLinux designs an offline‑load calculation algorithm to estimate remaining compute capacity on each core: offline_load = 1 - avg/T avg decays by half every T milliseconds online runtime continuously feeds into avg When a core is fully occupied (online 100%), offline_load becomes 0, preventing scheduling. When online usage is low (e.g., 20%), offline_load is high (≈0.8), allowing offline tasks to be placed. Additionally, a queue‑wait time factor is introduced to prioritize offline tasks that have waited longer, improving their chance to capture idle CPU. 3. Evaluation Results Extensive testing across multiple business scenarios shows the new scheme dramatically improves CPU utilization while keeping online latency unchanged. Scenario A (latency‑sensitive module a): CPU usage rose from ~15% to 60% with no increase in error rate. Scenario B (module b): CPU usage increased from 20% to 50% while latency remained stable. Scenario C (module c, less latency‑sensitive): CPU usage reached 90% without impacting online metrics. 4. TLinux Team Overview The TLinux team is responsible for Tencent’s server operating system, kernel, distribution, and virtualization development. Their mixed‑deployment solution is now integrated into the internal kernel and adopted by many business units.

performance optimizationmixed deploymentCPU utilizationcgroupcpusetLinux scheduling
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.