How Koordinator Revolutionizes Cloud‑Native Mixed‑Workload Scheduling
Koordinator, an open‑source cloud‑native scheduler launched in April 2022, unifies heterogeneous workloads on Kubernetes through zero‑intrusion plugins, fine‑grained resource oversubscription, QoS‑aware scheduling, and a flexible descheduler framework, dramatically improving resource utilization and latency‑sensitive service performance.
Project Origin and Evolution
Since its first release in April 2022, Koordinator has undergone eight version iterations, attracting engineers from Alibaba, Xiaomi, Xiaohongshu, iQIYI, and 360. On November 3, 2022, at the Hangzhou Cloud Native Conference, Alibaba Cloud announced the official launch of Koordinator 1.0.
Motivation and Vision
Koordinator addresses the need to coordinate diverse Kubernetes workloads—batch, real‑time, AI, and big‑data jobs—so they can share nodes efficiently. It builds on the mixed‑workload concepts pioneered by Google’s Borg system and the widespread adoption of Kubernetes as the industry standard.
Key Challenges in Mixed‑Workload Deployment
Application Integration – How to onboard workloads onto a mixed‑workload platform.
Stable and Efficient Execution – How to keep applications running reliably and with high performance.
Design Principles
Koordinator delivers a zero‑intrusion solution:
No changes to native Kubernetes components; functionality is added via plugins.
No modifications to workload operators; policies are configured declaratively.
Seamless, loss‑less upgrade from the default scheduler to Koordinator.
Architecture Overview
The architecture provides complete mixed‑workload orchestration, resource scheduling, isolation, and performance tuning. It extracts idle resources from nodes and reallocates them to latency‑sensitive services, boosting overall utilization.
Large‑Scale Production Experience
Alibaba has run mixed‑workload scheduling at scale since 2016, achieving over 50% CPU utilization and cutting Double‑11 2022 compute costs dramatically. Koordinator inherits three generations of internal architecture and aims to provide a neutral, community‑driven solution for enterprises.
Supported Workload Types
Fine‑grained resource orchestration for performance‑critical and tail‑latency requirements.
Intelligent oversubscription that runs low‑priority tasks on allocated but unused resources.
The oversubscription model visualizes allocated, used, and oversellable resources, enabling batch (MapReduce‑like) and real‑time jobs to coexist.
Zero‑Intrusion Integration Details
Plugin‑based extensions keep the Kubernetes core untouched.
Configuration‑driven policies avoid changes to operators.
Gradual scheduler upgrade preserves existing Pod scheduling semantics.
Feature Deep Dives
Enhanced Coscheduling
Supports strict and non‑strict modes and multi‑role AI jobs (e.g., TensorFlow with PS and Worker) to satisfy All‑or‑Nothing requirements.
Enhanced ElasticQuota Scheduling
Compatible with the community ElasticQuota CRD.
Tree‑structured quota management aligns with organizational hierarchies.
Shared‑weight fairness and optional quota borrowing.
Fine‑Grained Device Scheduling
GPU sharing, exclusive use, and oversubscription.
Percentage‑based GPU allocation.
Multi‑GPU scheduling and NVLink‑aware placement (in progress).
Differentiated SLOs
Implements Priority & QoS tiers, CPU Suppress, CPU Burst, memory‑threshold eviction, and CPU‑satisfaction‑based eviction to protect latency‑sensitive services while allowing best‑effort workloads to utilize spare capacity.
Resource Isolation with RDT
Uses Intel Resource Director Technology to partition L3 cache and memory bandwidth (MBA), preventing noisy‑neighbor interference across mixed workloads.
QoS‑Aware Scheduling and Rescheduling
Load‑aware scheduling filters overloaded nodes, while the descheduler framework offers plug‑in‑based custom strategies, safe pod migration via the PodMigrationJob CRD, and reservation‑first eviction to avoid resource starvation.
Reservation API
Allows pre‑allocation of resources without modifying Pod specs, supporting future workload needs, PaaS scaling, rolling updates, fragment consolidation, and safe rescheduling.
Future Roadmap
The community plans to extend mixed‑workload support to Hadoop YARN and other big‑data frameworks, improve interference detection, and continue standardizing mixed‑workload capabilities across vendors.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
