Cloud Native 14 min read

Mastering Kubernetes Pod Scheduling: Affinity, Anti‑Affinity, and Custom Schedulers

This article explains how Kubernetes Scheduler assigns Pods to nodes, covering fairness, resource efficiency, performance, and flexibility, then details the predicate and priority phases, common algorithms, node and pod affinity rules, custom scheduler configuration, and practical YAML examples with command‑line demonstrations.

MaGe Linux Operations

Oct 19, 2020

Mastering Kubernetes Pod Scheduling: Affinity, Anti‑Affinity, and Custom Schedulers

Kubernetes Pod Scheduling Overview

The Scheduler is the Kubernetes component that assigns Pods to cluster nodes. It must balance fairness, efficient resource use, scheduling performance, and flexibility.

Running as a separate service, the Scheduler continuously watches the API server and creates a binding for each Pod whose spec.nodeName is empty.

Scheduling Process

The process consists of two main steps: filtering nodes that do not satisfy requirements (the predicate phase) and ranking the remaining nodes (the priority phase). The node with the highest priority is selected; any error aborts the operation.

Predicate Algorithms

PodFitsResources – node must have enough free CPU/memory for the Pod.

PodFitsHost – if nodeName is set, the node name must match.

PodFitsHostPort – requested ports must not conflict with existing ports on the node.

PodSelectorMatches – node labels must match the Pod’s selector.

NoDiskConflict – volumes must not conflict unless both are read‑only.

If no node passes the predicate phase, the Pod remains in Pending and is repeatedly re‑evaluated.

Priority Algorithms

LeastRequestedPriority – nodes with lower CPU/memory usage receive higher weight.

BalanceResourceAllocation – nodes with balanced CPU and memory usage receive higher weight (used together with the previous rule).

ImageLocalityPriority – nodes that already have the required container image receive higher weight.

The scheduler computes a final score by aggregating the weighted priorities.

Custom Scheduler

You can define a custom scheduler by setting spec.schedulerName in the Pod spec. Example:

apiVersion: v1
kind: Pod
metadata:
  name: scheduler-test
  labels:
    name: example-scheduler
spec:
  schedulerName: my-scheduler
  containers:
  - name: Pod-test
    image: nginx:v1

Affinity Scheduling Methods

Kubernetes supports three main scheduling methods: affinity (including node and pod affinity), taint/toleration, and fixed scheduling strategies. This article focuses on affinity.

Node Affinity

Node affinity is defined in pod.spec.affinity.nodeAffinity and has two policies:

requiredDuringSchedulingIgnoredDuringExecution (hard) – the Pod must be scheduled on nodes that satisfy the rule; otherwise it stays Pending.

preferredDuringSchedulingIgnoredDuringExecution (soft) – the scheduler prefers nodes that satisfy the rule but will fall back to other nodes if none match.

Hard Policy Example

apiVersion: v1
kind: Pod
metadata:
  name: affinity-required
  labels:
    name: node-affinity-pod
spec:
  containers:
  - name: with-node-required
    image: nginx:1.2.1
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - testcentos7

This policy means the Pod must not be placed on the node named testcentos7 .

Creating the Pod on a single‑master, single‑node cluster results in Pending because no suitable node exists.

Soft Policy Example

apiVersion: v1
kind: Pod
metadata:
  name: affinity-preferred
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-preferred
    image: nginx:1.2.1
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - testcentos7

This policy prefers the node testcentos7 ; if it does not exist, the Pod is scheduled elsewhere.

Combined Hard and Soft Example

apiVersion: v1
kind: Pod
metadata:
  name: affinity-node
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-affinity-node
    image: nginx:v1
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - k8s-node2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: source
            operator: In
            values:
            - hello

The Pod must avoid node k8s-node2 but prefers nodes whose source label equals hello .

Operator Types

In – label value is in the list.

NotIn – label value is not in the list.

Gt – label value is greater than the specified number.

Lt – label value is less than the specified number.

Exists – label key exists.

DoesNotExist – label key does not exist.

If multiple nodeSelectorTerms are defined, satisfying any one term is sufficient; all matchExpressions within a term must be satisfied.

Pod Affinity and Anti‑Affinity

Defined in pod.spec.affinity.podAffinity and pod.spec.affinity.podAntiAffinity. They also have hard ( requiredDuringSchedulingIgnoredDuringExecution) and soft ( preferredDuringSchedulingIgnoredDuringExecution) policies.

Pod Affinity Hard Example

apiVersion: v1
kind: Pod
metadata:
  name: affinity-required
  labels:
    app: pod-3
spec:
  containers:
  - name: with-pod-required
    image: nginx:1.2.1
    imagePullPolicy: IfNotPresent
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - nginx
        topologyKey: kubernetes.io/hostname

The Pod must be scheduled on the same node as a Pod whose label app=nginx .

Pod Anti‑Affinity Hard Example

apiVersion: v1
kind: Pod
metadata:
  name: required-pod2
  labels:
    app: pod-3
spec:
  containers:
  - name: with-pod-required
    image: nginx:1.2.1
    imagePullPolicy: IfNotPresent
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - nginx
        topologyKey: kubernetes.io/hostname

This Pod must not share a node with any Pod labeled app=nginx ; on a single‑node cluster it remains Pending .

Pod Anti‑Affinity Soft Example

apiVersion: v1
kind: Pod
metadata:
  name: affinity-prefered
  labels:
    app: pod-3
spec:
  containers:
  - name: with-pod-prefered
    image: nginx:v1
    imagePullPolicy: IfNotPresent
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - pod-2
          topologyKey: kubernetes.io/hostname

Soft anti‑affinity prefers not to place the Pod on the same node as Pods with label app=pod-2 , but will schedule elsewhere if necessary.

In summary, Kubernetes scheduling combines predicate filtering, priority ranking, and affinity/anti‑affinity rules to control where Pods run, allowing both strict (hard) constraints and preferential (soft) preferences, and can be extended with custom schedulers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Pod Scheduling Custom Scheduler Node Affinity Affinity

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.