Cloud Native 14 min read

Koordinator v1.5.0 Release: New Features and Enhancements

Koordinator v1.5.0, the 13th major release since its open‑source debut, introduces pod‑level NUMA alignment, Terway network QoS, core scheduling, and numerous performance and stability improvements, while also being accepted as a CNCF Sandbox project and outlining future roadmap plans.

Alibaba Cloud Infrastructure

Jul 5, 2024

Koordinator v1.5.0 Release: New Features and Enhancements

Background Koordinator is an open‑source project born from Alibaba's years of experience in container scheduling. Since its first public release in April 2022, it has iterated through many versions, providing mixed‑workload orchestration, resource scheduling, isolation, and performance tuning for Kubernetes clusters.

The v1.5.0 release is the 13th major version, contributed by engineers from Alibaba, Ant Group, Intel, Xiaohongshu, Xiaomi, iQIYI, 360, Youzan and others. It adds several new capabilities such as pod‑level NUMA alignment, Terway network QoS, and Core Scheduling.

Koordinator has recently passed the CNCF TOC vote and is accepted as a CNCF Sandbox project, further strengthening its cloud‑native ecosystem integration.

Version Feature Highlights

Pod‑Level NUMA Alignment Policy

Previously, NUMA alignment required node‑level labeling, which added management overhead. v1.5.0 introduces pod‑level NUMA policies, allowing users to set policies directly on Pods.

apiVersion: v1
kind: Pod
metadata:
  name: pod-1
  annotations:
    scheduling.koordinator.sh/numa-topology-spec: |
      {"numaTopologyPolicy": "SingleNUMANode"}
spec:
  containers:
    resources:
      requests:
        cpu: 1
      limits:
        cpu: 1

The pod‑level policy also supports exclusive allocation to avoid resource contention between pods with different NUMA strategies.

apiVersion: v1
kind: Pod
metadata:
  name: pod-1
  annotations:
    scheduling.koordinator.sh/numa-topology-spec: |
      {"numaTopologyPolicy": "SingleNUMANode", "singleNUMANodeExclusive": "Required"}
spec:
  containers:
    resources:
      requests:
        cpu: 1
      limits:
        cpu: 1

Terway Network QoS

Koordinator now integrates with the Terway CNI to provide network QoS, enabling bandwidth limits per pod or QoS class. It supports business‑type bandwidth limits, dynamic adjustments, and whole‑machine limits across multiple NICs.

# unit: bps
resource-qos-config: |
  {
    "clusterStrategy": {
      "policies": {"netQOSPolicy":"terway-qos"},
      "lsClass": {
        "networkQOS": {
          "enable": true,
          "ingressRequest": "50M",
          "ingressLimit": "100M",
          "egressRequest": "50M",
          "egressLimit": "100M"
        }
      },
      "beClass": {
        "networkQOS": {
          "enable": true,
          "ingressRequest": "10M",
          "ingressLimit": "200M",
          "egressRequest": "10M",
          "egressLimit": "200M"
        }
      }
    }
  }
system-config: |
  {
    "clusterStrategy": {
      "totalNetworkBandwidth": "600M"
    }
  }

Pod‑level bandwidth limits can be set via annotations (see the Terway QoS documentation for details).

Core Scheduling

v1.5.0 enables container‑level Core Scheduling, which isolates tasks on physical cores to mitigate side‑channel attacks and improve CPU QoS in multi‑tenant environments.

# Example of the slo-controller-config ConfigMap.
apiVersion: v1
kind: ConfigMap
metadata:
  name: slo-controller-config
  namespace: koordinator-system
data:
  resource-qos-config: |
    {
      "clusterStrategy": {
        "policies": {"cpuPolicy": "coreSched"},
        "lsClass": {
          "cpuQOS": {"enable": true, "coreExpeller": true, "schedIdle": 0}
        },
        "beClass": {
          "cpuQOS": {"enable": true, "coreExpeller": false, "schedIdle": 1}
        }
      }
    }

Pods can be assigned to Core Scheduling groups via a label protocol, and the feature builds on Anolis OS kernel support.

Other Enhancements

Reservation Restricted mode now supports annotation‑based control.

Coscheduling implements fair queueing for gang‑scheduled pods.

NRI mode adds reconnection mechanisms.

Various bug fixes and performance improvements, including upgrade to Kubernetes 1.28.

Future Plans

Scheduling performance optimization with benchmark guides.

Joint allocation of heterogeneous resources (GPU + high‑performance NICs) for AI large‑model training.

Job‑level preemption support.

Load‑aware scheduling enhancements for inflight pods.

Fine‑grained last‑level cache and memory‑bandwidth isolation using NRI.

Acknowledgements

The project thanks its many contributors, new maintainers, and the CNCF community for support. New community members are welcomed via DingTalk group 33383887.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Kubernetes Scheduling numa Core Scheduling network QoS

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.