Cloud Native 13 min read

TLinux Team's Mixed Deployment Scheme for Improving Whole-Machine CPU Utilization

Tencent’s TLinux team introduced a kernel‑level mixed‑deployment framework that adds an offline scheduling class and load‑balancing algorithm, enabling online tasks to instantly pre‑empt offline work and boosting whole‑machine CPU utilization to as high as 90% while preserving latency‑sensitive service performance.

Tencent Cloud Developer

Oct 8, 2019

TLinux Team's Mixed Deployment Scheme for Improving Whole-Machine CPU Utilization

TLinux team at Tencent proposes a brand‑new mixed‑deployment (混部) solution that significantly raises whole‑machine CPU utilization without affecting online services. In some scenarios the utilization can reach up to 90%.

Background : Tencent operates a massive fleet of servers whose CPU utilization is often low, especially for offline workloads. Improving CPU usage can effectively double the capacity of a machine and reduce operating costs.

1. Existing Mixed‑Deployment Schemes

Two main approaches are currently used:

Cpuset: isolates online and offline workloads on separate CPU cores. It works in some cases but limits multi‑threaded performance and does not achieve true mixing.

Cgroup: uses cgroup share and period/quota to limit CPU time for offline groups. It helps latency‑insensitive services but cannot guarantee that online tasks can pre‑empt offline ones when needed.

Both methods fail to solve the core problem: online workloads cannot timely pre‑empt offline workloads, which prevents mixed deployment in latency‑sensitive scenarios.

2. TLinux Team's Mixed‑Deployment Scheme

The new framework adds kernel‑level support for online‑offline mixing, including a dedicated offline scheduling class, load‑balancing optimizations, bandwidth limiting, and user‑space interfaces.

Problem 1 – Online Pre‑empting Offline : Existing CFS scheduling requires two conditions for pre‑emption (virtual time smaller and the victim has run longer than a minimum). TLinux introduces an offline scheduling class whose priority is lower than CFS but higher than idle, allowing CFS to pre‑empt offline tasks immediately.

Problem 2 – Efficient Use of Idle CPU by Offline Tasks : TLinux designs an offline‑load calculation algorithm to estimate remaining compute capacity on each core:

offline_load = 1 - avg/T

avg decays by half every T milliseconds

online runtime continuously feeds into avg

When a core is fully occupied (online 100%), offline_load becomes 0, preventing scheduling. When online usage is low (e.g., 20%), offline_load is high (≈0.8), allowing offline tasks to be placed. Additionally, a queue‑wait time factor is introduced to prioritize offline tasks that have waited longer, improving their chance to capture idle CPU.

3. Evaluation Results

Extensive testing across multiple business scenarios shows the new scheme dramatically improves CPU utilization while keeping online latency unchanged. Scenario A (latency‑sensitive module a): CPU usage rose from ~15% to 60% with no increase in error rate. Scenario B (module b): CPU usage increased from 20% to 50% while latency remained stable. Scenario C (module c, less latency‑sensitive): CPU usage reached 90% without impacting online metrics.

4. TLinux Team Overview

The TLinux team is responsible for Tencent’s server operating system, kernel, distribution, and virtualization development. Their mixed‑deployment solution is now integrated into the internal kernel and adopted by many business units.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization mixed deployment cpu-utilization cgroup cpuset Linux scheduling

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.