Understanding Kubernetes kube‑scheduler Architecture, Workflow, and Plugin Development
This article explains the role of kube‑scheduler in Kubernetes, details its scheduling process, describes the plugin‑based framework with extension points such as PreEnqueue, Filter and Bind, and provides complete code examples and deployment instructions for building custom scheduler plugins.
kube-scheduler is one of the core components of Kubernetes responsible for assigning Pods to the most suitable Nodes based on configurable algorithms and policies, thereby improving cluster resource utilization.
Scheduling Process
By default the built‑in scheduler satisfies most workloads, but real‑world scenarios often require custom constraints, such as limiting a Pod to a specific set of Nodes or reserving certain Nodes for particular applications.
The scheduler continuously watches the API Server for Pods whose PodSpec.NodeName is empty, creates a binding for each, and repeats the cycle.
The process appears simple, yet production environments must address fairness, resource efficiency, and custom policy enforcement.
How to guarantee fairness across heterogeneous Nodes?
How to ensure every Node receives allocated resources?
How to maximize overall cluster utilization?
How to maintain high scheduling performance?
Can users define their own scheduling strategies?
The scheduler is implemented in a plugin‑based architecture located under kubernetes/pkg/scheduler . The entry point is pkg/scheduler/scheduler.go , and the binary is built from cmd/kube-scheduler/scheduler.go .
Scheduling consists of three main stages:
Pre‑filter (Predicates) : filters out Nodes that do not satisfy mandatory constraints.
Score (Priorities) : assigns a numeric score to the remaining Nodes.
Binding : selects the highest‑scoring Node and creates the binding.
During the Predicates stage, the scheduler iterates over all Nodes, discarding those that fail mandatory rules. If no Node matches, the Pod stays in Pending until a suitable Node appears.
In the Priorities stage, the scheduler ranks the filtered Nodes; the Node with the highest score is chosen for binding.
Scheduler Framework
The framework defines a set of extension points that users can implement to inject custom logic. The main extension points are:
PreEnqueue : runs before a Pod enters the active queue.
QueueSort : orders Pods in the scheduling queue.
PreFilter : performs early validation of a Pod.
Filter : eliminates Nodes that cannot run the Pod.
PostFilter : runs after filtering, useful for logging or state updates.
PreScore / Score / NormalizeScore : assign and normalize scores for Nodes.
Reserve and Unreserve : reserve resources before binding.
Permit : can approve, deny, or delay binding.
PreBind : runs just before the binding operation.
Bind : performs the actual binding of a Pod to a Node.
PostBind : runs after a successful bind for cleanup.
Plugins are registered through KubeSchedulerConfiguration . An example configuration enabling a custom reserve and preBind plugin named foo while disabling baz looks like:
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
---
plugins:
reserve:
enabled:
- name: foo
- name: bar
disabled:
- name: baz
preBind:
enabled:
- name: foo
disabled:
- name: baz
pluginConfig:
- name: foo
args: |
foo插件可以解析的任意内容The order of plugin execution follows these rules:
If an extension point has no custom plugin, the default plugin is used.
If custom plugins are enabled, the default plugin runs first, then the custom ones in the order listed under enabled .
Disabled default plugins can be re‑enabled later to change the call order.
To add a new plugin, implement the required interfaces (e.g., PreFilterPlugin , FilterPlugin , PreBindPlugin ) and register it with WithPlugin when constructing the scheduler command.
Example plugin implementation (simplified):
package plugins
import (
"context"
"fmt"
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/klog/v2"
"k8s.io/kubernetes/pkg/scheduler/framework"
"simple-scheduler/pkg/scheduler/apis/config"
"simple-scheduler/pkg/scheduler/apis/config/validation"
)
const Name = "sample-plugin"
type Sample struct {
args *config.SampleArgs
handle framework.Handle
}
func (s *Sample) Name() string { return Name }
func (s *Sample) PreFilter(ctx context.Context, cycleState *framework.CycleState, pod *v1.Pod) (*framework.PreFilterResult, *framework.Status) {
klog.V(3).Infof("prefilter pod: %v", pod.Name)
return nil, nil
}
func (s *Sample) Filter(ctx context.Context, cycleState *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
klog.V(3).Infof("filter pod: %v, node: %v", pod.Name, nodeInfo.Node().Name)
return framework.NewStatus(framework.Success, "")
}
func (s *Sample) PreBind(ctx context.Context, state *framework.CycleState, pod *v1.Pod, nodeName string) *framework.Status {
if nodeInfo, err := s.handle.SnapshotSharedLister().NodeInfos().Get(nodeName); err != nil {
return framework.NewStatus(framework.Error, fmt.Sprintf("prebind get node: %s info error: %s", nodeName, err.Error()))
} else {
klog.V(3).Infof("prebind node info: %+v", nodeInfo.Node())
return framework.NewStatus(framework.Success, "")
}
}
func New(fpArgs runtime.Object, fh framework.Handle) (framework.Plugin, error) {
args, ok := fpArgs.(*config.SampleArgs)
if !ok {
return nil, fmt.Errorf("got args of type %T, want *SampleArgs", fpArgs)
}
if err := validation.ValidateSamplePluginArgs(*args); err != nil {
return nil, err
}
return &Sample{args: args, handle: fh}, nil
}After building the binary, create the necessary RBAC objects, a ConfigMap containing the scheduler configuration, and a Deployment that runs the custom scheduler. Pods can then select this scheduler by setting schedulerName: sample-scheduler in their spec.
From Kubernetes v1.17 onward, the entire pre‑filter and priority logic is plugin‑based, making the scheduler framework the recommended extension point for custom scheduling behavior.
For further details and the full source code, refer to the GitHub repository cnych/sample-scheduler-framework .
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.