Mastering Kubernetes Pod Scheduling: Affinity, Anti‑Affinity, and Custom Schedulers
This article explains how Kubernetes Scheduler assigns Pods to nodes, covering fairness, resource efficiency, performance, and flexibility, then details the predicate and priority phases, common algorithms, node and pod affinity rules, custom scheduler configuration, and practical YAML examples with command‑line demonstrations.
Kubernetes Pod Scheduling Overview
The Scheduler is the Kubernetes component that assigns Pods to cluster nodes. It must balance fairness, efficient resource use, scheduling performance, and flexibility.
Running as a separate service, the Scheduler continuously watches the API server and creates a binding for each Pod whose spec.nodeName is empty.
Scheduling Process
The process consists of two main steps: filtering nodes that do not satisfy requirements (the predicate phase) and ranking the remaining nodes (the priority phase). The node with the highest priority is selected; any error aborts the operation.
Predicate Algorithms
PodFitsResources – node must have enough free CPU/memory for the Pod.
PodFitsHost – if nodeName is set, the node name must match.
PodFitsHostPort – requested ports must not conflict with existing ports on the node.
PodSelectorMatches – node labels must match the Pod’s selector.
NoDiskConflict – volumes must not conflict unless both are read‑only.
If no node passes the predicate phase, the Pod remains in Pending and is repeatedly re‑evaluated.
Priority Algorithms
LeastRequestedPriority – nodes with lower CPU/memory usage receive higher weight.
BalanceResourceAllocation – nodes with balanced CPU and memory usage receive higher weight (used together with the previous rule).
ImageLocalityPriority – nodes that already have the required container image receive higher weight.
The scheduler computes a final score by aggregating the weighted priorities.
Custom Scheduler
You can define a custom scheduler by setting spec.schedulerName in the Pod spec. Example:
apiVersion: v1
kind: Pod
metadata:
name: scheduler-test
labels:
name: example-scheduler
spec:
schedulerName: my-scheduler
containers:
- name: Pod-test
image: nginx:v1Affinity Scheduling Methods
Kubernetes supports three main scheduling methods: affinity (including node and pod affinity), taint/toleration, and fixed scheduling strategies. This article focuses on affinity.
Node Affinity
Node affinity is defined in pod.spec.affinity.nodeAffinity and has two policies:
requiredDuringSchedulingIgnoredDuringExecution (hard) – the Pod must be scheduled on nodes that satisfy the rule; otherwise it stays Pending.
preferredDuringSchedulingIgnoredDuringExecution (soft) – the scheduler prefers nodes that satisfy the rule but will fall back to other nodes if none match.
Hard Policy Example
apiVersion: v1
kind: Pod
metadata:
name: affinity-required
labels:
name: node-affinity-pod
spec:
containers:
- name: with-node-required
image: nginx:1.2.1
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: NotIn
values:
- testcentos7This policy means the Pod must not be placed on the node named testcentos7 .
Creating the Pod on a single‑master, single‑node cluster results in Pending because no suitable node exists.
Soft Policy Example
apiVersion: v1
kind: Pod
metadata:
name: affinity-preferred
labels:
app: node-affinity-pod
spec:
containers:
- name: with-node-preferred
image: nginx:1.2.1
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- testcentos7This policy prefers the node testcentos7 ; if it does not exist, the Pod is scheduled elsewhere.
Combined Hard and Soft Example
apiVersion: v1
kind: Pod
metadata:
name: affinity-node
labels:
app: node-affinity-pod
spec:
containers:
- name: with-affinity-node
image: nginx:v1
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: NotIn
values:
- k8s-node2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: source
operator: In
values:
- helloThe Pod must avoid node k8s-node2 but prefers nodes whose source label equals hello .
Operator Types
In – label value is in the list.
NotIn – label value is not in the list.
Gt – label value is greater than the specified number.
Lt – label value is less than the specified number.
Exists – label key exists.
DoesNotExist – label key does not exist.
If multiple nodeSelectorTerms are defined, satisfying any one term is sufficient; all matchExpressions within a term must be satisfied.
Pod Affinity and Anti‑Affinity
Defined in pod.spec.affinity.podAffinity and pod.spec.affinity.podAntiAffinity. They also have hard ( requiredDuringSchedulingIgnoredDuringExecution) and soft ( preferredDuringSchedulingIgnoredDuringExecution) policies.
Pod Affinity Hard Example
apiVersion: v1
kind: Pod
metadata:
name: affinity-required
labels:
app: pod-3
spec:
containers:
- name: with-pod-required
image: nginx:1.2.1
imagePullPolicy: IfNotPresent
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostnameThe Pod must be scheduled on the same node as a Pod whose label app=nginx .
Pod Anti‑Affinity Hard Example
apiVersion: v1
kind: Pod
metadata:
name: required-pod2
labels:
app: pod-3
spec:
containers:
- name: with-pod-required
image: nginx:1.2.1
imagePullPolicy: IfNotPresent
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostnameThis Pod must not share a node with any Pod labeled app=nginx ; on a single‑node cluster it remains Pending .
Pod Anti‑Affinity Soft Example
apiVersion: v1
kind: Pod
metadata:
name: affinity-prefered
labels:
app: pod-3
spec:
containers:
- name: with-pod-prefered
image: nginx:v1
imagePullPolicy: IfNotPresent
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pod-2
topologyKey: kubernetes.io/hostnameSoft anti‑affinity prefers not to place the Pod on the same node as Pods with label app=pod-2 , but will schedule elsewhere if necessary.
In summary, Kubernetes scheduling combines predicate filtering, priority ranking, and affinity/anti‑affinity rules to control where Pods run, allowing both strict (hard) constraints and preferential (soft) preferences, and can be extended with custom schedulers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
