Koordinator vs Crane: Which Scheduler Optimizes Kubernetes Resource Usage?
The article examines how native Kubernetes scheduling based solely on resource requests leads to waste and imbalance, compares the open‑source crane‑scheduler and koord‑scheduler architectures, explains practical configuration of Koordinator, and provides step‑by‑step testing procedures to achieve load‑aware scheduling.
Background
Native Kubernetes scheduler only considers resource Requests, which often differ greatly from actual usage in production, causing resource waste and load imbalance.
Open‑source solution comparison: crane‑scheduler vs koord‑scheduler
crane‑scheduler architecture
Precondition: Prometheus must be installed and data fetched from Prometheus.
koord‑scheduler architecture
Metrics are collected by koordlet, a DaemonSet plugin that stores data locally in Prometheus.
Comparison
Metric collection period : crane relies on external Prometheus (default 30 s, coarse); koordlet runs on each node with a local Prometheus (default 1 s).
Value types : crane provides avg and max; koord provides avg, p50, p90, p95, p99.
Offline mixing support : crane does not support; koord supports online Pods (LSE/LSR/LS) and offline Pods (BE).
hotValue resource estimation : both support.
Utilization denominator : crane uses host total resources (unreasonable); koord uses Node allocatable (reasonable).
Overall, koord‑scheduler is chosen.
Koordinator usage practice
Add
UsageAggregatedDurationof 18 h:
<code>kubectl -n koordinator-system edit cm slo-controller-config</code> <code>data:
colocation-config: |
{
"enable": true,
"metricAggregatePolicy": {
"durations": [
"5m",
"10m",
"30m",
"18h"
]
}
}
</code>Modify Prometheus storage duration in koordlet:
<code>kubectl -n koordinator-system edit ds koordlet</code> <code> containers:
- args:
- -addr=:9316
- -cgroup-root-dir=/host-cgroup/
- --logtostderr=true
- --tsdb-retention-duration=18h
</code>Use promtool inside the Pod to view data: ./promtool tsdb list /metric-data/
Update threshold trigger rules (requires restarting koord‑scheduler):
<code>kubectl -n koordinator-system edit cm koord-scheduler-config</code> <code> aggregated:
usageThresholds:
cpu: 55
memory: 85
usageAggregationType: "p99"
scoreAggregationType: "p99"
estimatedScalingFactors:
cpu: 85
memory: 70
</code> <code>kubectl -n koordinator-system rollout restart deployment koord-scheduler</code>Since public‑cloud resources may have their own scheduler, only the IDC data‑center scheduler is modified, adding a mutating webhook for quick rollback if needed.
Activation method (label namespace):
<code>kubectl label ns ${NsName} koordinator-injection=enabled</code>Rollback method:
<code>kubectl label ns ${NsName} koordinator-injection-</code>Source code: https://github.com/koordinator-sh/koordinator
Customized code: https://github.com/clay-wangzhi/koordinator
Quick deployment of customized code:
<code>git clone https://github.com/clay-wangzhi/koordinator
cd koordinator/manifests
kubectl apply -f setup/
kubectl apply -f koordlet/
kubectl apply -f koord-scheduler/
kubectl apply -f koord-manager/
</code>Testing
1) Identify high‑load Nodes:
<code>kubectl top node | sort -nk 3
kubectl get nodemetrics.slo.koordinator.sh</code>2) Label a high‑load Node and several normal Nodes:
<code>kubectl label node $(NodeName) test=true</code>3) Label the application namespace to enable the mutating webhook (sets
SchedulerNameto
koord-scheduler):
<code>kubectl label ns ${NsName} koordinator-injection=enabled</code>4) Add node affinity and pod anti‑affinity to an application, matching the number of replicas to the number of labeled Nodes:
<code>spec:
replicas: 4
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test
operator: In
values:
- "true"
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: appid
operator: In
values:
- $(AppidName)
topologyKey: kubernetes.io/hostname
</code>5) Verify the result: a Pod should be in
Pendingwith a reason containing the expected text, indicating successful configuration.
Reference links:
Crane‑Scheduler: Real‑world workload‑aware scheduler design and implementation – https://cloud.tencent.com/developer/article/2296515?areaId=106005
Koordinator load‑aware scheduling – https://koordinator.sh/zh-Hans/docs/user-manuals/load-aware-scheduling
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.