Cloud Native 14 min read

How Koordinator Brings Cloud‑Native Mixed‑Workload Scheduling to Scale

Koordinator, an open‑source cloud‑native scheduler born from Alibaba's massive production practice, enables efficient, low‑cost mixed‑workload orchestration on Kubernetes by addressing integration challenges, offering a resource‑oversell model, and providing zero‑intrusion deployment for both cloud and on‑prem environments.

Alibaba Cloud Native

Apr 6, 2022

Mixed‑Workload Scheduling

Mixed‑workload scheduling (often called “混部”) coordinates heterogeneous workloads—such as latency‑sensitive services, batch analytics, and AI jobs—on the same cluster to smooth demand peaks and valleys. By filling idle capacity of one workload type with tasks from another, overall resource utilization improves, operational cost drops, and service stability increases.

Alibaba’s Production Experience

Alibaba began experimenting with containers in 2011 and started dedicated mixed‑workload research in 2016. After several architecture upgrades the current cloud‑native system runs tens of millions of CPU cores, achieving an average CPU utilization above 50 % and saving billions of RMB in resource costs during large‑scale events such as Double‑11.

Koordinator Project

Koordinator is an open‑source project that packages Alibaba’s large‑scale mixed‑workload practices as extensions to vanilla Kubernetes. It provides a low‑entry‑cost, high‑efficiency scheduling layer for cloud‑native environments, enabling enterprises to reap the same utilization and stability benefits without extensive custom development.

Zero‑Intrusion Design

Koordinator does not modify upstream Kubernetes code. Installation consists of a single‑click deployment of the Koordinator controller and scheduler components (e.g., via Helm or a static manifest). When the feature is disabled, the cluster behaves exactly like a standard Kubernetes installation.

Colocation Profile – Workload Integration Layer

Workloads are integrated through a lightweight Colocation Profile expressed as ordinary YAML files. The profile describes the desired co‑location behavior (e.g., resource‑oversell settings, priority class) and is applied without changing the original Deployment/Job specifications. Today Koordinator ships a Spark‑task example; future extensions will cover Flink, Hadoop, AI, and media workloads.

Resource‑Oversell Model

The core scheduling model classifies resources into four bands:

limit – the amount of resources requested by high‑priority Pods (Kubernetes request).

usage – the actual consumption measured over time.

short‑term reservation – predicted unused capacity based on recent usage (seconds to minutes). This band is allocated to short‑lived batch tasks.

long‑term reservation – predicted unused capacity over a longer horizon (hours). It is used for longer‑lived, less latency‑sensitive jobs.

Predictions are derived from historical usage curves; the scheduler then places low‑priority Pods into the reservation space while preserving enough headroom for high‑priority services. The model works together with priority preemption, load‑aware placement, interference detection, and QoS guarantees to form a production‑grade mixed‑workload scheduler.

Scheduling Features

Priority preemption to protect latency‑critical services.

Load‑aware node selection based on real‑time usage.

Interference detection that avoids colocating noisy batch jobs with sensitive services.

QoS‑aware enforcement ensuring SLO compliance.

Key Benefits

Improved overall CPU utilization (>50 % on >10 million cores).

Stable execution of latency‑sensitive services while simultaneously running batch workloads.

Unified architecture for both cloud and on‑premises environments, reducing operational overhead.

Reduced resource cost through systematic oversell and reclamation.

Open‑Source Resources

Source code: https://github.com/koordinator-sh/koordinator Reference paper on the original Borg system that inspired Kubernetes and Koordinator:

https://research.google/pubs/pub49065/

Illustrative Diagram

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes Resource Optimization open source Mixed Workload Scheduling

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.