Why Did My K8s Node Crash After a CPU Upgrade? Uncovering Static CPU Manager Policies
This article explains how upgrading a Kubernetes node's CPU while using the static CPU manager policy can cause kubelet startup failures, details the underlying CPU manager architecture and topology handling, and provides a simple two‑step recovery method.
Background
When a Kubernetes cluster runs out of capacity, adding nodes is the usual solution. A recent incident occurred when a node's CPU was upgraded, the node was rebooted, and kubelet failed to start, causing massive pod eviction alerts.
Symptoms
After the reboot, network tests confirmed connectivity to the API server and load balancer. The kubelet logs showed the following error:
E1121 23:43:52.644552 23453 policy_static.go:158] "Static policy invalid state, please drain node and remove policy state file" err="current set of available CPUs \"0-7\" doesn't match with CPUs in state \"0-3\""
E1121 23:43:52.644569 23453 cpu_manager.go:230] "Policy start error" err="current set of available CPUs \"0-7\" doesn't match with CPUs in state \"0-3\""
E1121 23:43:52.644587 23453 kubelet.go:1431] "Failed to start ContainerManager" err="start cpu manager error: current set of available CPUs \"0-7\" doesn't match with CPUs in state \"0-3\""The key parameter involved is --cpu-manager-policy. When set to static , kubelet creates a cpu_manager_state file under the directory specified by --root-dir .
Example content of cpu_manager_state:
{ "policyName": "static", "defaultCpuSet": "0-7", "checksum": 14413152 }If the node's CPU configuration is changed while using the static policy, kubelet reads this file, compares it with the current CPU set, and refuses to start when they differ.
Principle Analysis
The following official documentation describes the CPU management policies:
https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/cpu-management-policies/
Below is an overview of the CPU Manager architecture.
CPU Manager Architecture
When a container that meets the criteria requests specific CPUs, the CPU Manager allocates them according to CPU topology (CPU affinity) with the following priority:
If the container's required logical CPUs are at least the number of logical CPUs in a whole socket, the entire socket is allocated.
If the remaining request is at least the number of logical CPUs in a single core, the whole core is allocated.
Any leftover logical CPUs are chosen based on the following ordered list:
Available CPUs on the same socket.
Available CPUs on the same core.
Reference code:
pkg/kubelet/cm/cpumanager/cpu_assignment.go func takeByTopology(topo *topology.CPUTopology, availableCPUs cpuset.CPUSet, numCPUs int) (cpuset.CPUSet, error) {
acc := newCPUAccumulator(topo, availableCPUs, numCPUs)
if acc.isSatisfied() {
return acc.result, nil
}
if acc.isFailed() {
return cpuset.NewCPUSet(), fmt.Errorf("not enough cpus available to satisfy request")
}
// Algorithm: topology-aware best-fit
// 1. Acquire whole sockets if needed
for _, s := range acc.freeSockets() {
if acc.needs(acc.topo.CPUsPerSocket()) {
glog.V(4).Infof("[cpumanager] takeByTopology: claiming socket [%d]", s)
acc.take(acc.details.CPUsInSocket(s))
if acc.isSatisfied() {
return acc.result, nil
}
}
}
// 2. Acquire whole cores if needed
for _, c := range acc.freeCores() {
if acc.needs(acc.topo.CPUsPerCore()) {
glog.V(4).Infof("[cpumanager] takeByTopology: claiming core [%d]", c)
acc.take(acc.details.CPUsInCore(c))
if acc.isSatisfied() {
return acc.result, nil
}
}
}
// 3. Acquire single threads, preferring partially‑allocated cores on the same sockets
for _, c := range acc.freeCPUs() {
glog.V(4).Infof("[cpumanager] takeByTopology: claiming CPU [%d]", c)
if acc.needs(1) {
acc.take(cpuset.NewCPUSet(c))
}
if acc.isSatisfied() {
return acc.result, nil
}
}
return cpuset.NewCPUSet(), fmt.Errorf("failed to allocate cpus")
}Discover CPU Topology
cAdvisor obtains CPU topology by reading /proc/cpuinfo and /sys/devices/system/cpu/cpu. Relevant structures are defined in vendor/github.com/google/cadvisor/info/v1/machine.go:
type MachineInfo struct {
NumCores int `json:"num_cores"`
// Machine Topology
Topology []Node `json:"topology"`
}
type Node struct {
Id int `json:"node_id"`
Memory uint64 `json:"memory"`
Cores []Core `json:"cores"`
Caches []Cache `json:"caches"`
}The function GetTopology builds this information from the host.
func GetTopology(sysFs sysfs.SysFs, cpuinfo string) ([]info.Node, int, error) {
nodes := []info.Node{}
// ...
return nodes, numCores, nil
}Pod Creation Process
When a pod is created under the static policy, the following steps occur:
KubeRuntime calls the container runtime to create the container.
The container is handed to the CPU Manager.
The CPU Manager processes the container according to the static policy.
It selects the best CPU set from the shared pool based on topology.
The chosen CPUs are recorded in the checkpoint state and removed from the shared pool.
The CPU set is applied to the container via the CRI UpdateContainerResources call.
KubeRuntime finally starts the container.
Reference code:
pkg/kubelet/cm/cpumanager/cpu_manager.go func (m *manager) AddContainer(pod *v1.Pod, container *v1.Container, containerID string) {
m.Lock()
defer m.Unlock()
if cset, exists := m.state.GetCPUSet(string(pod.UID), container.Name); exists {
m.lastUpdateState.SetCPUSet(string(pod.UID), container.Name, cset)
}
m.containerMap.Add(string(pod.UID), container.Name, containerID)
}Pod Deletion Process
When a container managed by the CPU Manager is deleted, the flow is:
KubeRuntime calls the CPU Manager to handle the static policy cleanup.
The CPU set allocated to the container is returned to the shared pool.
KubeRuntime removes the container via the runtime.
The CPU Manager asynchronously updates the shared pool for other containers.
Reference code:
pkg/kubelet/cm/cpumanager/cpu_manager.go func (m *manager) RemoveContainer(containerID string) error {
m.Lock()
defer m.Unlock()
err := m.policyRemoveContainerByID(containerID)
if err != nil {
klog.ErrorS(err, "RemoveContainer error")
return err
}
return nil
}Solution
To recover the node after the mismatch:
Delete the existing cpu_manager_state file.
Restart kubelet.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
