Cloud Native 33 min read

Understanding Kubernetes kube‑scheduler Architecture, Workflow, and Plugin Development

This article explains the role of kube‑scheduler in Kubernetes, details its scheduling process, describes the plugin‑based framework with extension points such as PreEnqueue, Filter and Bind, and provides complete code examples and deployment instructions for building custom scheduler plugins.

Cloud Native Technology Community

Apr 20, 2023

Understanding Kubernetes kube‑scheduler Architecture, Workflow, and Plugin Development

kube-scheduler

is one of the core components of Kubernetes responsible for assigning Pods to the most suitable Nodes based on configurable algorithms and policies, thereby improving cluster resource utilization.

Scheduling Process

By default the built‑in scheduler satisfies most workloads, but real‑world scenarios often require custom constraints, such as limiting a Pod to a specific set of Nodes or reserving certain Nodes for particular applications.

The scheduler continuously watches the API Server for Pods whose PodSpec.NodeName is empty, creates a binding for each, and repeats the cycle.

The process appears simple, yet production environments must address fairness, resource efficiency, and custom policy enforcement.

How to guarantee fairness across heterogeneous Nodes?

How to ensure every Node receives allocated resources?

How to maximize overall cluster utilization?

How to maintain high scheduling performance?

Can users define their own scheduling strategies?

The scheduler is implemented in a plugin‑based architecture located under kubernetes/pkg/scheduler. The entry point is pkg/scheduler/scheduler.go, and the binary is built from cmd/kube-scheduler/scheduler.go.

Scheduling consists of three main stages:

Pre‑filter (Predicates) : filters out Nodes that do not satisfy mandatory constraints.

Score (Priorities) : assigns a numeric score to the remaining Nodes.

Binding : selects the highest‑scoring Node and creates the binding.

During the Predicates stage, the scheduler iterates over all Nodes, discarding those that fail mandatory rules. If no Node matches, the Pod stays in Pending until a suitable Node appears.

In the Priorities stage, the scheduler ranks the filtered Nodes; the Node with the highest score is chosen for binding.

Scheduler Framework

The framework defines a set of extension points that users can implement to inject custom logic. The main extension points are: PreEnqueue: runs before a Pod enters the active queue. QueueSort: orders Pods in the scheduling queue. PreFilter: performs early validation of a Pod. Filter: eliminates Nodes that cannot run the Pod. PostFilter: runs after filtering, useful for logging or state updates. PreScore / Score / NormalizeScore: assign and normalize scores for Nodes. Reserve and Unreserve: reserve resources before binding. Permit: can approve, deny, or delay binding. PreBind: runs just before the binding operation. Bind: performs the actual binding of a Pod to a Node. PostBind: runs after a successful bind for cleanup.

Plugins are registered through KubeSchedulerConfiguration. An example configuration enabling a custom reserve and preBind plugin named foo while disabling baz looks like:

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
---
plugins:
  reserve:
    enabled:
    - name: foo
    - name: bar
    disabled:
    - name: baz
  preBind:
    enabled:
    - name: foo
    disabled:
    - name: baz
pluginConfig:
- name: foo
  args: |
    foo插件可以解析的任意内容

The order of plugin execution follows these rules:

If an extension point has no custom plugin, the default plugin is used.

If custom plugins are enabled, the default plugin runs first, then the custom ones in the order listed under enabled.

Disabled default plugins can be re‑enabled later to change the call order.

To add a new plugin, implement the required interfaces (e.g., PreFilterPlugin, FilterPlugin, PreBindPlugin) and register it with WithPlugin when constructing the scheduler command.

Example plugin implementation (simplified):

package plugins

import (
    "context"
    "fmt"
    v1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/klog/v2"
    "k8s.io/kubernetes/pkg/scheduler/framework"
    "simple-scheduler/pkg/scheduler/apis/config"
    "simple-scheduler/pkg/scheduler/apis/config/validation"
)

const Name = "sample-plugin"

type Sample struct {
    args   *config.SampleArgs
    handle framework.Handle
}

func (s *Sample) Name() string { return Name }

func (s *Sample) PreFilter(ctx context.Context, cycleState *framework.CycleState, pod *v1.Pod) (*framework.PreFilterResult, *framework.Status) {
    klog.V(3).Infof("prefilter pod: %v", pod.Name)
    return nil, nil
}

func (s *Sample) Filter(ctx context.Context, cycleState *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
    klog.V(3).Infof("filter pod: %v, node: %v", pod.Name, nodeInfo.Node().Name)
    return framework.NewStatus(framework.Success, "")
}

func (s *Sample) PreBind(ctx context.Context, state *framework.CycleState, pod *v1.Pod, nodeName string) *framework.Status {
    if nodeInfo, err := s.handle.SnapshotSharedLister().NodeInfos().Get(nodeName); err != nil {
        return framework.NewStatus(framework.Error, fmt.Sprintf("prebind get node: %s info error: %s", nodeName, err.Error()))
    } else {
        klog.V(3).Infof("prebind node info: %+v", nodeInfo.Node())
        return framework.NewStatus(framework.Success, "")
    }
}

func New(fpArgs runtime.Object, fh framework.Handle) (framework.Plugin, error) {
    args, ok := fpArgs.(*config.SampleArgs)
    if !ok {
        return nil, fmt.Errorf("got args of type %T, want *SampleArgs", fpArgs)
    }
    if err := validation.ValidateSamplePluginArgs(*args); err != nil {
        return nil, err
    }
    return &Sample{args: args, handle: fh}, nil
}

After building the binary, create the necessary RBAC objects, a ConfigMap containing the scheduler configuration, and a Deployment that runs the custom scheduler. Pods can then select this scheduler by setting schedulerName: sample-scheduler in their spec.

From Kubernetes v1.17 onward, the entire pre‑filter and priority logic is plugin‑based, making the scheduler framework the recommended extension point for custom scheduling behavior.

For further details and the full source code, refer to the GitHub repository cnych/sample-scheduler-framework .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Plugin Scheduler Scheduling Framework

Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.