Mastering Kubernetes Node Affinity, Taints, and Tolerations: A Practical Guide
This article explains Kubernetes node and pod relationships through a story, then details how to use nodeSelector, nodeAffinity, podAffinity, taints, and tolerations with concrete kubectl commands and YAML examples, helping you control pod scheduling and eviction behavior.
Preface
This article tells a simple story to illustrate the love‑hate relationship between node and pod in Kubernetes, then introduces the key scheduling concepts.
Male (node) vs. female (pod): three high‑quality male nodes are evenly distributed among many female pods, similar to the default Deployment or DaemonSet scheduling.
nodeSelector
Assign a pod to a specific node by labeling the node and referencing that label in the pod spec.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 20h v1.19.7
k8s-node01 Ready <none> 20h v1.19.7
k8s-node02 Ready <none> 20h v1.19.7Label a node:
$ kubectl label nodes k8s-node01 disktype=ssd
node/k8s-node01 labeledVerify the label:
$ kubectl get nodes --show-labels
... k8s-node01 Ready <none> 20h v1.19.7 ... disktype=ssd ...Create a pod that uses the label:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ngx
spec:
replicas: 2
selector:
matchLabels:
app: ngx
template:
metadata:
labels:
app: ngx
spec:
containers:
- name: nginx
image: nginx:alpine-arm64
nodeSelector:
disktype: ssdKubernetes Affinity
Affinity determines where pods can be scheduled. It has two main types: nodeAffinity (node‑level) and podAffinity (pod‑level).
nodeAffinity
Node affinity can be hard (required) or soft (preferred).
requiredDuringSchedulingIgnoredDuringExecution – hard strategy
preferredDuringSchedulingIgnoredDuringExecution – soft strategy
Hard strategy example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ngx
spec:
replicas: 2
selector:
matchLabels:
app: ngx
template:
metadata:
labels:
app: ngx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
containers:
- name: nginx
image: nginx:alpine-arm64Soft strategy example (prefer disktype=hdd but fall back to default scheduler):
apiVersion: apps/v1
kind: Deployment
metadata:
name: ngx
spec:
replicas: 2
selector:
matchLabels:
app: ngx
template:
metadata:
labels:
app: ngx
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 10
preference:
matchExpressions:
- key: disktype
operator: In
values:
- hdd
containers:
- name: nginx
image: nginx:alpine-arm64Taints and Tolerations
Taints are node attributes that repel pods; tolerations are pod attributes that allow pods to ignore specific taints.
Taints
View taints on a node:
$ kubectl describe nodes k8s-master | grep Taints
Taints: node-role.kubernetes.io/master:NoScheduleSet a taint (prevent scheduling):
$ kubectl taint node k8s-master key1=value1:NoScheduleSet a taint with PreferNoSchedule (soft block):
$ kubectl taint node k8s-master key2=value2:PreferNoScheduleSet a taint with NoExecute (evict existing pods):
$ kubectl taint node k8s-master key3=value3:NoExecuteDelete a taint:
$ kubectl taint node k8s-master key1:PreferNoSchedule-
$ kubectl taint node k8s-master key2-Tolerations
Allow a pod to tolerate a taint:
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"Using Exists operator (no value needed):
tolerations:
- key: "key"
operator: "Exists"
effect: "NoSchedule"Example of tolerating built‑in taints for NotReady or Unreachable nodes (default 300 s):
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300To make a pod fail immediately when a node becomes NotReady, set tolerationSeconds: 0 in the pod spec.
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 0
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 0Cluster‑wide default toleration times can be changed in the API server configuration with flags --default-not-ready-toleration-seconds and --default-unreachable-toleration-seconds.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
