How Alibaba Cloud’s Differential SLO Boosts Kubernetes Resource Utilization
This article explains Alibaba Cloud Container Service for Kubernetes's differential SLO approach, detailing the reclaimed‑resource model, CPU burst and topology‑aware scheduling, kernel group identity, memory watermark tiering, and real‑world case studies that demonstrate significant improvements in cluster efficiency and latency‑sensitive workload performance.
Background
Alibaba Cloud has accumulated years of experience with "differentiated SLO mixed deployment" and now offers a leading‑edge solution that runs heterogeneous workloads—latency‑sensitive (LS) and best‑effort (BE)—on the same node, exploiting their distinct resource‑SLO characteristics to improve overall cluster utilization.
Resource Model
The reclaimed‑resource model defines three zones on a node: usage (actual consumption), buffered (reserved portion), and reclaimed (excess that can be over‑committed). The reclaimed amount equals the sum of reclaimed resources from Guaranteed/Burstable pods.
# Node
status:
allocatable:
# milli‑core
alibabacloud.com/reclaimed‑cpu: 50000
# bytes
alibabacloud.com/reclaimed‑memory: 50000
capacity:
alibabacloud.com/reclaimed‑cpu: 50000
alibabacloud.com/reclaimed‑memory: 100000ACK exposes these reclaimed metrics as standard extended resources in the node’s Node status.
Using Reclaimed Resources in Pods
Low‑priority BE pods can request reclaimed resources by adding the alibabacloud.com/qos label (BE or LS) and specifying alibabacloud.com/reclaimed‑cpu and alibabacloud.com/reclaimed‑memory in their resources section.
# Pod
metadata:
labels:
alibabacloud.com/qos: BE # {BE, LS}
spec:
containers:
- resources:
limits:
alibabacloud.com/reclaimed‑cpu: 1000
alibabacloud.com/reclaimed‑memory: 2048
requests:
alibabacloud.com/reclaimed‑cpu: 1000
alibabacloud.com/reclaimed‑memory: 2048Technical Details
CPU Burst
Kubernetes limits enforce a time‑slice per 100 ms period. When a container’s CPU limit is 2 cores, the kernel caps its usage to 200 ms per period, causing throttling and latency spikes for LS workloads. CPU Burst lets containers accumulate idle time‑slices and spend them during bursts, reducing tail latency. ACK fully supports CPU Burst and, on kernels without native support, emulates the behavior by monitoring throttling and dynamically adjusting limits.
CPU Topology‑Aware Scheduling
High pod density on modern multi‑core nodes leads to CPU contention and NUMA effects. The static policy only works for Guaranteed QoS pods and applies cluster‑wide, lacking fine‑grained control. ACK implements a scheduling framework‑based topology‑aware scheduler that supports all QoS classes, enables per‑pod core pinning, and selects the optimal node‑CPU topology across the cluster.
Elastic Resource Limits (Reclaimed‑Resource)
The reclaimed‑resource pool varies dynamically with LS pod usage. BE pods consume reclaimed CPU only when LS pods leave sufficient headroom; otherwise, their effective CPU share shrinks.
Kernel Group Identity
Starting with kernel‑4.19.91‑24.al7, Alibaba Cloud Linux introduces Group Identity, adding a second red‑black tree for low‑priority tasks. This separates scheduling of high‑ and low‑priority tasks, minimizing wake‑up latency for high‑priority workloads and preventing low‑priority tasks from affecting them, even under SMT.
LLC and MBA Isolation
On bare‑metal nodes, ACK can dynamically adjust Last‑Level Cache (LLC) and Memory Bandwidth Allocation (MBA) for BE pods, reducing interference with LS pods.
Global Memory Watermark Tiering
When BE tasks suddenly allocate large memory, the system may hit the global wmark_min, triggering direct memory reclamation and hurting LS latency. Alibaba Cloud Linux adds a tiered global wmark_min: BE’s watermark is raised (earlier reclamation) while LS’s is lowered (delayed reclamation), preventing LS from entering the slow reclamation path.
Asynchronous Background Reclamation
ACK introduces a container‑level asynchronous reclamation mechanism using a workqueue and the memory.wmark_ratio control file (available in both cgroup v1 and v2). When a container’s memory usage exceeds the ratio, the kernel performs proactive reclamation before synchronous reclamation would occur.
Case Studies
CPU Burst Performance
Using Apache HTTP Server as an LS workload, enabling CPU Burst on Alibaba Cloud Linux 2 reduced the 99th‑percentile response time (RT‑p99) compared with CentOS 7, eliminated CPU throttling, and kept overall pod utilization stable.
Mixed‑Workload Resource Efficiency
In a "Web + Big Data" scenario, nginx (LS) and Spark benchmark (BE) were co‑located on the same ACK node. Compared with non‑mixed baselines, the differential SLO suite kept nginx latency degradation under 5 % while increasing overall cluster CPU utilization from 49 % to 58 % and reducing Spark job total runtime by 8 %.
Conclusion
Alibaba Cloud Container Service for Kubernetes (ACK) now offers a suite of differential SLO features—reclaimed resources, CPU burst, topology‑aware scheduling, kernel group identity, memory watermark tiering, and asynchronous reclamation—that can be used independently or together. Real‑world experiments show up to 30 % higher cluster utilization and latency‑sensitive performance impact limited to less than 5 % in mixed deployments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
