Interview Question: How Does Kubernetes Schedule Pods to Nodes? (Full Answer)
The article explains Kubernetes pod scheduling in detail, covering how the kube‑scheduler filters and scores nodes, binds the pod, and how kubelet launches containers, plus common reasons for pods staying Pending and useful troubleshooting commands.
Overview
The pod scheduling process consists of eight steps: user creates pod, pod information is stored in the API server and etcd, kube‑scheduler detects unscheduled pods, filters unsuitable nodes, scores the remaining nodes, selects the highest‑scoring node, binds the pod to that node, and kubelet on the target node creates and runs the containers.
Step 1 – Pod creation does not immediately run
When a pod is created it is only written to the API server; it usually has no node assigned yet (the NODE column is empty in kubectl get pod -o wide). The component that finds a machine for the pod is the kube‑scheduler, which continuously watches for pods without a node and selects the most suitable node.
Step 2 – The essence of scheduling
From all nodes in the cluster, filter those that can run the pod and then choose the most suitable one.
The process is divided into three phases:
待调度 Pod
↓
过滤:哪些 Node 可以运行?
↓
打分:哪些 Node 更适合运行?
↓
绑定:把 Pod 绑定到目标 NodeStep 3 – Filtering unsuitable nodes
The scheduler first filters nodes. Nodes are excluded for several reasons:
1. Insufficient resources
If a pod requests resources (e.g., cpu: "2", memory: "4Gi"), the scheduler checks each node’s remaining allocatable resources. Nodes lacking enough CPU or memory are filtered out. The scheduler looks at requests, not limits.
2. Node marked unschedulable
A node can be cordoned ( kubectl cordon node-1); new pods will not be scheduled onto it.
3. NodeSelector mismatch
If a pod specifies nodeSelector (e.g., disk: ssd), only nodes with the matching label are considered. Nodes without the label are filtered.
4. NodeAffinity not satisfied
NodeAffinity is a more flexible form of NodeSelector. For example, a pod may require node-role=app; nodes without that label are filtered.
5. Taint/Toleration mismatch
Nodes can have taints (e.g., kubectl taint nodes node-1 dedicated=gpu:NoSchedule) that repel pods without matching tolerations. Pods lacking the appropriate toleration are filtered out.
Step 4 – Scoring the remaining nodes
After filtering, the scheduler scores the candidate nodes based on factors such as resource availability, image locality, affinity preferences, even pod distribution, and topology constraints. The node with the highest score is selected; ties are broken arbitrarily.
Step 5 – Binding the pod to the chosen node
The scheduler sends a binding request to the API server, setting spec.nodeName to the selected node (e.g., nodeName: node-1). Once bound, the scheduler’s job is done.
Step 6 – What kubelet does after binding
Each node runs a kubelet that watches for pods assigned to it. When a new pod appears, kubelet performs the following actions:
Pull image
↓
Create container
↓
Mount volumes
↓
Configure network
↓
Start podkube‑scheduler decides which node a pod runs on; kubelet runs the pod on that node.
Step 7 – Why some pods stay Pending
Understanding the scheduling flow helps diagnose why a pod remains in Pending. Common reasons:
1. Insufficient resources
The pod’s CPU or memory request cannot be satisfied by any node. kubectl describe pod <pod-name> shows events like “Insufficient cpu” or “Insufficient memory”.
2. Node label mismatch
NodeSelector or NodeAffinity constraints have no matching nodes.
3. Missing toleration for a node taint
A node’s taint blocks the pod without a matching toleration.
4. PVC binding failure
If the pod depends on a PersistentVolumeClaim that cannot be bound, scheduling may fail.
Step 8 – Common troubleshooting commands
kubectl describe pod <pod-name>(focus on the Events section) kubectl describe node <node-name> (view node resources) kubectl get nodes --show-labels (view node labels) kubectl describe node <node-name> | grep Taints (view node taints)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Linux Cloud-Native Ops Stack
Focused on practical internet operations, sharing server monitoring, troubleshooting, automated deployment, and cloud-native tech insights. From Linux basics to advanced K8s, from ops tools to architecture optimization, helping engineers avoid pitfalls, grow quickly, and become your tech companion.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
