Cloud Native 9 min read

Boosting Cluster Utilization with Alibaba's K8s Mixed Deployment and QoS Priorities

This article explains Alibaba's seven‑year experience with mixed deployment on Kubernetes, detailing how priority classes and QoS models are used to reclaim idle resources for low‑SLO workloads, improve overall cluster utilization, and maintain service‑level objectives for both online and offline pods.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Boosting Cluster Utilization with Alibaba's K8s Mixed Deployment and QoS Priorities

Introduction

Since 2014 Alibaba has been developing an offline mixed‑deployment technique that has been validated through multiple Double‑11 events and is now deployed at massive scale across the group. The approach saves billions of yuan annually and raises overall cluster resource utilization to around 70 %.

The technology is packaged as a plug‑in that can be installed on any standard native Kubernetes cluster, providing mixed‑deployment control and operational capabilities to improve both resource usage and user experience.

Kubernetes Native Model

In many Kubernetes environments practitioners conflate Priority (scheduling order) with QoS (runtime resource guarantees). The native model separates these concepts: Priority determines which pod is considered first by the scheduler, while QoS classes (Guaranteed, Burstable, Best‑Effort) define the level of resource isolation at runtime.

Understanding this distinction is essential before introducing mixed‑deployment semantics.

Problems Addressed by Mixed Deployment

The primary goal is to maximise cluster utilisation while preserving the service‑level objectives (SLO) of all deployed applications. Online services are typically provisioned with peak resource specifications, leaving a large portion of the allocated CPU and memory idle. Mixed deployment oversells this idle capacity to low‑SLO offline jobs, requiring SLO‑aware scheduling and real‑time resource awareness to avoid hotspots.

When node‑level utilisation becomes high, offline jobs can be pre‑empted to protect online SLOs. This pre‑emption leverages kernel‑level cgroup isolation to enforce strict resource boundaries.

Application Level Model

Pods are classified into three custom QoS classes:

LSR – Low‑SLO‑Realtime

LS – Low‑SLO

BE – Best‑Effort (used for reclaimed resources)

The class is declared explicitly via pod annotations and labels, which are then mapped to both scheduling Priority and runtime QoS.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    alibabacloud.com/qosClass: BE # {LSR, LS, BE}
  labels:
    alibabacloud.com/qos: BE # {LSR, LS, BE}
spec:
  containers:
  - resources:
      limits:
        alibabacloud.com/reclaimed-cpu: 1000   # milli‑core, 1000 = 1 core
        alibabacloud.com/reclaimed-memory: 2048 # bytes (Gi, Mi, Ki, GB, MB, KB supported)
      requests:
        alibabacloud.com/reclaimed-cpu: 1000
        alibabacloud.com/reclaimed-memory: 2048

The BE class uses the extended‑resource mechanism ( alibabacloud.com/reclaimed‑cpu and alibabacloud.com/reclaimed‑memory) to request reclaimed capacity, while LSR and LS follow the standard CPU/memory fields.

These classes also influence network QoS, ensuring that low‑priority offline tasks do not monopolise bandwidth.

Scheduling Behavior

Both Priority and QoS classes affect the scheduler and the kubelet runtime. High‑SLO workloads (typically LSR or LS) receive higher scheduling priority and stronger QoS guarantees, while BE pods are scheduled later and can be pre‑empted when node pressure rises.

Quota, Waterline, and Multi‑Tenant Isolation

Beyond per‑pod priority, production deployments also enforce node‑level waterline thresholds, tenant‑specific quotas, and OS‑level isolation (cgroup, memory‑waterline, etc.) to guarantee SLOs across multiple tenants. These mechanisms are mentioned for completeness and will be detailed in future articles.

Related Solutions and References

Alibaba Cloud exposes the mixed‑deployment capabilities through the ACK Agile edition and the CloudNative Stack (CNStack) family, combined with the OpenAnolis operating system, forming an end‑to‑end cloud‑native data‑center solution.

Technical reference documents:

https://kubernetes.io/docs/concepts/scheduling-eviction/

https://kubernetes.io/docs/concepts/workloads/pods/disruptions/

https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/

https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass

https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/

https://kubernetes.io/docs/tasks/configure-pod-container/extended-resource/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesSchedulingmixed deploymentresource utilizationQoS
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.