Boost Server Utilization: TencentOS ‘Ruyi’ Mixed‑Deployment Solution Explained
This article explores how TencentOS Server’s mixed‑deployment product “Ruyi” combines cluster scheduling optimization with per‑node QoS to dramatically increase CPU utilization, cut energy costs, and improve resource isolation in large‑scale data‑center environments.
1. The Huge Significance of Efficiency and Cost Reduction
IDC construction is expanding, with server costs accounting for over half of data‑center expenses, while average CPU utilization remains only 10‑20%. A simple calculation shows that raising utilization from 15% to 30% across 10,000 servers could save up to 5 billion CNY.
2. What Is Mixed Deployment?
Mixed deployment (混部) mixes online latency‑sensitive workloads and offline CPU‑intensive tasks on the same cluster or server, aiming to maximize resource usage and lower operating costs.
Online workloads: high QoS demand, low average load, peak‑time resource spikes (e.g., search, payment).
Offline workloads: less latency‑sensitive, heavy CPU/IO usage (e.g., AI training).
3. Main Characteristics of Mixed Deployment
Increase resource utilization by placing more workloads on each server.
Offline tasks compete for CPU, IO, memory, and network, potentially affecting online services in unpredictable ways.
The key challenge is to boost utilization without harming online services.
4. Mainstream Mixed‑Deployment Approaches
Two major categories exist:
Cluster‑level mixed deployment.
Server‑level mixed deployment.
Both aim to raise overall resource utilization.
5. Cluster‑Level Mixed Deployment (Container‑Based)
With mature container technology, mixed deployment often means mixing online and offline containers. Isolation between containers is crucial to protect online QoS. Scheduling can be time‑based: online tasks run during peak hours, offline tasks fill idle periods.
However, this approach leaves idle time fragments and incurs long conflict‑resolution chains.
6. Server‑Level Mixed Deployment – TencentOS “Ruyi”
Early attempts using cgroup and CFS tuning were insufficient. In 2018, TencentOS introduced a CPU QoS feature for the 3.10 kernel, delivering the “Ruyi” product that solves container isolation at the server level, enabling more mixed containers and finer QoS metrics.
“Ruyi” has been deployed in thousands of servers across WeChat, gaming, advertising, etc., achieving notable results.
7. Features and Benefits of “Ruyi”
Container‑level granularity, leveraging the widespread container ecosystem.
Hierarchical QoS: high‑priority online containers are protected while low‑priority offline containers can borrow idle resources.
Unified scheduling for CPU, IO, network, and memory across all scenarios.
Simplified scheduling strategy by handling isolation within the server.
Low interference: offline impact on online services stays below 5% (often ~1%).
Rich QoS metrics for latency and pressure, aiding higher‑level schedulers.
Exception tracing for post‑mortem analysis.
High compatibility with both cgroup v1 and v2.
8. Real‑World Impact
Integrated into TencentOS Server 3 (5.4 kernel), “Ruyi” raises CPU utilization from below 15% to around 45%, saving nearly 200 million kWh of electricity and reducing carbon emissions by about 70 kt, contributing to carbon‑neutral goals.
Tencent Architect
We share technical insights on storage, computing, and access, and explore industry-leading product technologies together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.