How Koordinator Brings Cloud‑Native Mixed‑Workload Scheduling to Scale
Koordinator, an open‑source cloud‑native scheduler born from Alibaba's massive production practice, enables efficient, low‑cost mixed‑workload orchestration on Kubernetes by addressing integration challenges, offering a resource‑oversell model, and providing zero‑intrusion deployment for both cloud and on‑prem environments.
Mixed‑Workload Scheduling
Mixed‑workload scheduling (often called “混部”) coordinates heterogeneous workloads—such as latency‑sensitive services, batch analytics, and AI jobs—on the same cluster to smooth demand peaks and valleys. By filling idle capacity of one workload type with tasks from another, overall resource utilization improves, operational cost drops, and service stability increases.
Alibaba’s Production Experience
Alibaba began experimenting with containers in 2011 and started dedicated mixed‑workload research in 2016. After several architecture upgrades the current cloud‑native system runs tens of millions of CPU cores, achieving an average CPU utilization above 50 % and saving billions of RMB in resource costs during large‑scale events such as Double‑11.
Koordinator Project
Koordinator is an open‑source project that packages Alibaba’s large‑scale mixed‑workload practices as extensions to vanilla Kubernetes. It provides a low‑entry‑cost, high‑efficiency scheduling layer for cloud‑native environments, enabling enterprises to reap the same utilization and stability benefits without extensive custom development.
Zero‑Intrusion Design
Koordinator does not modify upstream Kubernetes code. Installation consists of a single‑click deployment of the Koordinator controller and scheduler components (e.g., via Helm or a static manifest). When the feature is disabled, the cluster behaves exactly like a standard Kubernetes installation.
Colocation Profile – Workload Integration Layer
Workloads are integrated through a lightweight Colocation Profile expressed as ordinary YAML files. The profile describes the desired co‑location behavior (e.g., resource‑oversell settings, priority class) and is applied without changing the original Deployment/Job specifications. Today Koordinator ships a Spark‑task example; future extensions will cover Flink, Hadoop, AI, and media workloads.
Resource‑Oversell Model
The core scheduling model classifies resources into four bands:
limit – the amount of resources requested by high‑priority Pods (Kubernetes request).
usage – the actual consumption measured over time.
short‑term reservation – predicted unused capacity based on recent usage (seconds to minutes). This band is allocated to short‑lived batch tasks.
long‑term reservation – predicted unused capacity over a longer horizon (hours). It is used for longer‑lived, less latency‑sensitive jobs.
Predictions are derived from historical usage curves; the scheduler then places low‑priority Pods into the reservation space while preserving enough headroom for high‑priority services. The model works together with priority preemption, load‑aware placement, interference detection, and QoS guarantees to form a production‑grade mixed‑workload scheduler.
Scheduling Features
Priority preemption to protect latency‑critical services.
Load‑aware node selection based on real‑time usage.
Interference detection that avoids colocating noisy batch jobs with sensitive services.
QoS‑aware enforcement ensuring SLO compliance.
Key Benefits
Improved overall CPU utilization (>50 % on >10 million cores).
Stable execution of latency‑sensitive services while simultaneously running batch workloads.
Unified architecture for both cloud and on‑premises environments, reducing operational overhead.
Reduced resource cost through systematic oversell and reclamation.
Open‑Source Resources
Source code: https://github.com/koordinator-sh/koordinator Reference paper on the original Borg system that inspired Kubernetes and Koordinator:
https://research.google/pubs/pub49065/Illustrative Diagram
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
