Cloud Native 24 min read

How Koordinator Improves Efficiency and Stability for Cloud‑Native Mixed Workloads

This article explains how Alibaba Cloud's open‑source Koordinator system tackles mixed‑workload challenges by introducing priority and QoS models, resource overcommit, load‑aware scheduling, fine‑grained CPU orchestration, and upcoming features such as GPU scheduling and resource recommendation, all illustrated with architecture diagrams and code examples.

Alibaba Cloud Native

Jun 16, 2022

How Koordinator Improves Efficiency and Stability for Cloud‑Native Mixed Workloads

Background and Motivation

In April 2022 Alibaba Cloud released the open‑source Koordinator project, which has since evolved through four versions to help enterprises improve the efficiency, stability, and cost of running mixed online and offline workloads on Kubernetes clusters. Mixed‑workload (or "co‑location") refers to deploying multiple container types—online services and batch jobs—on the same node or across a single cluster to increase overall resource utilization.

Data‑center utilization historically low (average <10% in 2011) and the rapid growth of big‑data workloads drive the need for better resource management. Surveys show that >77% of enterprises plan to migrate half of their big‑data applications to Kubernetes by the end of 2021, making mixed‑workload placement a common practice.

Koordinator Architecture

Koordinator extends the native Kubernetes control plane with two main dimensions: a central control layer (scheduler extensions, SLO‑Controller, Recommender, Colocation Profile Webhook) and a node‑side layer (Koordlet and Koord Runtime Proxy) that handle fine‑grained resource management and QoS enforcement.

Key Mechanisms

Priority Model : Four levels (Product, Mid, Batch, Free) defined via standard PriorityClass, with resource capacity reported as extended resources on nodes.

QoS Model : Three QoS classes (System, Latency Sensitive, Best Effort) with sub‑classes for latency‑sensitive workloads, applied via pod annotations.

Resource Overcommit : Reclaims unused CPU/memory from online pods and reallocates it to lower‑priority batch jobs. Example node status and pod annotations are shown below.

# node info
allocatable:
  koordinator.sh/batch-cpu: 50k # milli‑core
  koordinator.sh/batch-memory: 50Gi

# pod info
annotations:
  koordinator.sh/resource-limit: {cpu: "5k"}
resources:
  requests:
    koordinator.sh/batch-cpu: 5k
    koordinator.sh/batch-memory: 5Gi

Load‑Aware Scheduling : Scheduler plugin filters nodes with high load and prefers nodes with lower utilization, using metrics reported by Koordlet.

ClusterColocationProfile CRD : Enables one‑click activation of co‑location for selected namespaces or workloads via a mutating webhook.

apiVersion: config.koordinator.sh/v1alpha1
kind: ClusterColocationProfile
metadata:
  name: colocation-profile-example
spec:
  namespaceSelector:
    matchLabels:
      koordinator.sh/enable-colocation: "true"
  selector:
    matchLabels:
      sparkoperator.k8s.io/launched-by-spark-operator: "true"
  qosClass: BE
  priorityClassName: koord-batch
  koordinatorPriority: 1000
  schedulerName: koord-scheduler
  labels:
    koordinator.sh/mutated: "true"
  annotations:
    koordinator.sh/intercepted: "true"
  patch:
    spec:
      terminationGracePeriodSeconds: 30

Applying the profile and labeling a namespace enables Spark jobs submitted via Spark Operator to be automatically co‑located with latency‑sensitive pods.

$ kubectl apply -f profile.yaml
$ kubectl label ns spark-job koordinator.sh/enable-colocation=true
$ # submit Spark job; Pods created by SparkOperator will be co‑located.

QoS Enhancements

CPU Suppress : Dynamically shares idle CPU from online pods with batch pods, throttling batch pods when online load rises.

Resource‑Satisfaction Eviction : Evicts low‑priority batch pods when their CPU satisfaction ratio falls below a threshold and utilization exceeds 90%.

CPU Burst : Accumulates unused CPU credits and allows batch pods to burst when needed, reducing tail latency.

Group Identity : Uses kernel‑level group identity to give online pods priority over batch pods sharing the same physical core.

Memory QoS : Adjusts cgroup memory settings to protect node stability while improving memory‑sensitive workloads.

Fine‑Grained CPU Orchestration

Koordinator introduces detailed CPU orchestration policies (e.g., SameCore, Spread) tailored to the three LS sub‑classes (LSE, LSR, LS). These policies are compatible with Kubernetes CPUManager and NUMA Topology Manager, allowing safe gradual adoption.

Resource Reservation

The upcoming Reservation CRD lets users pre‑allocate resources for anticipated spikes, scaling events, or safe re‑scheduling, without modifying existing Kubernetes APIs.

kind: Reservation
metadata:
  name: my-reservation
  namespace: default
spec:
  template: ... # copy of the Pod spec
  resourceOwners:
    controller:
      apiVersion: apps/v1
      kind: Deployment
      name: deployment-5b8df84dd
  timeToLiveInSeconds: 300
  nodeName: node-1
status:
  phase: Available

Future Roadmap

Version 0.5 will add fine‑grained CPU orchestration and resource reservation. Planned features for later releases include GPU scheduling, gang scheduling, elastic quota, and a profile‑based resource recommendation engine that analyzes historical usage to suggest optimal request/limit settings.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native kubernetes Resource Management Scheduling QoS mixed workloads Koordinator

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.