How Extending the Kubernetes Scheduler Cut Manual Interventions by 80%
This article describes how Zhongtong tackled severe pod‑scheduling imbalance in its Kubernetes clusters by extending the scheduler with the Scheduling Framework, adding custom filter and score plugins that balanced CPU load, boosted overall utilization from 30% to 50%, and reduced manual scheduling interventions by about 80%.
During containerization at Zhongtong, we observed severe scheduling imbalance in the production Kubernetes cluster: during peak load, CPU usage on some nodes exceeded 80% while others stayed around 20%, causing latency and heavy manual intervention.
kube-scheduler Overview
The kube-scheduler is a core Kubernetes component that continuously watches the API server for Pods without a NodeName and assigns them to suitable nodes based on a series of algorithms and policies.
Scheduling Phases
The scheduling process consists of three main phases:
Predicates (Pre‑selection) : filters out nodes that do not satisfy the Pod’s requirements.
Priorities (Scoring) : scores the remaining nodes and selects the highest‑scoring one.
Bind : updates the Pod resource with the chosen node.
Common predicate algorithms include volume checks, node pressure (memory, disk, PID), general checks (requests, hostname, hostPort, nodeSelector), taint‑toleration, volumeBinding, and pod affinity/anti‑affinity.
Typical priority algorithms include SelectorSpreadPriority, InterPodAffinityPriority, LeastRequestedPriority, resource request balance, node affinity, ImageLocalityPriority, etc.
Extending the Scheduler
Kubernetes provides two extension methods:
Scheduler‑extender : a webhook‑based filter and score extension (deprecated after v1.24).
Scheduling Framework : introduced in v1.15, offers a pluggable set of extension points for easier and maintainable custom scheduling logic.
The framework defines extension points such as QueueSort, Pre‑filter, Filter, Post‑filter, PreScore, Score, NormalizeScore, Reserve, Permit, Pre‑bind, Bind, Post‑bind, and Unreserve. Each point allows custom plugins to influence the scheduling decision.
Implementation and Results
We implemented a custom scheduler using the Scheduling Framework. By adding a Filter plugin to exclude overloaded nodes and a Score plugin that ranks nodes based on actual memory and CPU usage, the cluster’s overall CPU utilization rose from ~30% to ~50%, and manual scheduling interventions dropped by about 80%.
Future articles will dive into the code and architectural details of this extension.
Zhongtong Tech
Integrating industry and information for digital efficiency, advancing Zhongtong Express's high-quality development through digitalization. This is the public channel of Zhongtong's tech team, delivering internal tech insights, product news, job openings, and event updates. Stay tuned!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.