Cloud Native 7 min read

Koordinator vs Crane: Which Scheduler Optimizes Kubernetes Resource Usage?

The article examines how native Kubernetes scheduling based solely on resource requests leads to waste and imbalance, compares the open‑source crane‑scheduler and koord‑scheduler architectures, explains practical configuration of Koordinator, and provides step‑by‑step testing procedures to achieve load‑aware scheduling.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Koordinator vs Crane: Which Scheduler Optimizes Kubernetes Resource Usage?

Background

Native Kubernetes scheduler only considers resource Requests, which often differ greatly from actual usage in production, causing resource waste and load imbalance.

Open‑source solution comparison: crane‑scheduler vs koord‑scheduler

crane‑scheduler architecture

Precondition: Prometheus must be installed and data fetched from Prometheus.

koord‑scheduler architecture

Metrics are collected by koordlet, a DaemonSet plugin that stores data locally in Prometheus.

Comparison

Metric collection period : crane relies on external Prometheus (default 30 s, coarse); koordlet runs on each node with a local Prometheus (default 1 s).

Value types : crane provides avg and max; koord provides avg, p50, p90, p95, p99.

Offline mixing support : crane does not support; koord supports online Pods (LSE/LSR/LS) and offline Pods (BE).

hotValue resource estimation : both support.

Utilization denominator : crane uses host total resources (unreasonable); koord uses Node allocatable (reasonable).

Overall, koord‑scheduler is chosen.

Koordinator usage practice

Add

UsageAggregatedDuration

of 18 h:

<code>kubectl -n koordinator-system edit  cm slo-controller-config</code>
<code>data:
  colocation-config: |
    {
      "enable": true,
      "metricAggregatePolicy": {
        "durations": [
          "5m",
          "10m",
          "30m",
          "18h"
        ]
      }
    }
</code>

Modify Prometheus storage duration in koordlet:

<code>kubectl -n koordinator-system edit ds koordlet</code>
<code>      containers:
      - args:
        - -addr=:9316
        - -cgroup-root-dir=/host-cgroup/
        - --logtostderr=true
        - --tsdb-retention-duration=18h
</code>
Use promtool inside the Pod to view data: ./promtool tsdb list /metric-data/

Update threshold trigger rules (requires restarting koord‑scheduler):

<code>kubectl -n koordinator-system edit cm  koord-scheduler-config</code>
<code>            aggregated:
              usageThresholds:
                cpu: 55
                memory: 85
              usageAggregationType: "p99"
              scoreAggregationType: "p99"
            estimatedScalingFactors:
              cpu: 85
              memory: 70
</code>
<code>kubectl -n koordinator-system rollout restart deployment koord-scheduler</code>

Since public‑cloud resources may have their own scheduler, only the IDC data‑center scheduler is modified, adding a mutating webhook for quick rollback if needed.

Activation method (label namespace):

<code>kubectl label ns ${NsName} koordinator-injection=enabled</code>

Rollback method:

<code>kubectl label ns ${NsName} koordinator-injection-</code>

Source code: https://github.com/koordinator-sh/koordinator

Customized code: https://github.com/clay-wangzhi/koordinator

Quick deployment of customized code:

<code>git clone https://github.com/clay-wangzhi/koordinator
cd koordinator/manifests
kubectl apply -f setup/
kubectl apply -f koordlet/
kubectl apply -f koord-scheduler/
kubectl apply -f koord-manager/
</code>

Testing

1) Identify high‑load Nodes:

<code>kubectl top node | sort -nk 3
kubectl get nodemetrics.slo.koordinator.sh</code>

2) Label a high‑load Node and several normal Nodes:

<code>kubectl label node $(NodeName) test=true</code>

3) Label the application namespace to enable the mutating webhook (sets

SchedulerName

to

koord-scheduler

):

<code>kubectl label ns ${NsName} koordinator-injection=enabled</code>

4) Add node affinity and pod anti‑affinity to an application, matching the number of replicas to the number of labeled Nodes:

<code>spec:
  replicas: 4
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: test
                operator: In
                values:
                - "true"
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: appid
                operator: In
                values:
                - $(AppidName)
            topologyKey: kubernetes.io/hostname
</code>

5) Verify the result: a Pod should be in

Pending

with a reason containing the expected text, indicating successful configuration.

Reference links:

Crane‑Scheduler: Real‑world workload‑aware scheduler design and implementation – https://cloud.tencent.com/developer/article/2296515?areaId=106005

Koordinator load‑aware scheduling – https://koordinator.sh/zh-Hans/docs/user-manuals/load-aware-scheduling

Cloud NativeKubernetesSchedulerOpen SourceKoordinatorLoad-aware
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.