Cloud Native 15 min read

Why Did My K8s Node Crash After a CPU Upgrade? Uncovering Static CPU Manager Policies

This article explains how upgrading a Kubernetes node's CPU while using the static CPU manager policy can cause kubelet startup failures, details the underlying CPU manager architecture and topology handling, and provides a simple two‑step recovery method.

Open Source Linux

Nov 29, 2022

Why Did My K8s Node Crash After a CPU Upgrade? Uncovering Static CPU Manager Policies

Background

When a Kubernetes cluster runs out of capacity, adding nodes is the usual solution. A recent incident occurred when a node's CPU was upgraded, the node was rebooted, and kubelet failed to start, causing massive pod eviction alerts.

Symptoms

After the reboot, network tests confirmed connectivity to the API server and load balancer. The kubelet logs showed the following error:

E1121 23:43:52.644552   23453 policy_static.go:158] "Static policy invalid state, please drain node and remove policy state file" err="current set of available CPUs \"0-7\" doesn't match with CPUs in state \"0-3\""
E1121 23:43:52.644569   23453 cpu_manager.go:230] "Policy start error" err="current set of available CPUs \"0-7\" doesn't match with CPUs in state \"0-3\""
E1121 23:43:52.644587   23453 kubelet.go:1431] "Failed to start ContainerManager" err="start cpu manager error: current set of available CPUs \"0-7\" doesn't match with CPUs in state \"0-3\""

The key parameter involved is --cpu-manager-policy. When set to static , kubelet creates a cpu_manager_state file under the directory specified by --root-dir .

Example content of cpu_manager_state:

{ "policyName": "static", "defaultCpuSet": "0-7", "checksum": 14413152 }

If the node's CPU configuration is changed while using the static policy, kubelet reads this file, compares it with the current CPU set, and refuses to start when they differ.

Principle Analysis

The following official documentation describes the CPU management policies:

https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/cpu-management-policies/

Below is an overview of the CPU Manager architecture.

CPU Manager Architecture

When a container that meets the criteria requests specific CPUs, the CPU Manager allocates them according to CPU topology (CPU affinity) with the following priority:

If the container's required logical CPUs are at least the number of logical CPUs in a whole socket, the entire socket is allocated.

If the remaining request is at least the number of logical CPUs in a single core, the whole core is allocated.

Any leftover logical CPUs are chosen based on the following ordered list:

Available CPUs on the same socket.

Available CPUs on the same core.

Reference code:

pkg/kubelet/cm/cpumanager/cpu_assignment.go

func takeByTopology(topo *topology.CPUTopology, availableCPUs cpuset.CPUSet, numCPUs int) (cpuset.CPUSet, error) {
    acc := newCPUAccumulator(topo, availableCPUs, numCPUs)
    if acc.isSatisfied() {
        return acc.result, nil
    }
    if acc.isFailed() {
        return cpuset.NewCPUSet(), fmt.Errorf("not enough cpus available to satisfy request")
    }
    // Algorithm: topology-aware best-fit
    // 1. Acquire whole sockets if needed
    for _, s := range acc.freeSockets() {
        if acc.needs(acc.topo.CPUsPerSocket()) {
            glog.V(4).Infof("[cpumanager] takeByTopology: claiming socket [%d]", s)
            acc.take(acc.details.CPUsInSocket(s))
            if acc.isSatisfied() {
                return acc.result, nil
            }
        }
    }
    // 2. Acquire whole cores if needed
    for _, c := range acc.freeCores() {
        if acc.needs(acc.topo.CPUsPerCore()) {
            glog.V(4).Infof("[cpumanager] takeByTopology: claiming core [%d]", c)
            acc.take(acc.details.CPUsInCore(c))
            if acc.isSatisfied() {
                return acc.result, nil
            }
        }
    }
    // 3. Acquire single threads, preferring partially‑allocated cores on the same sockets
    for _, c := range acc.freeCPUs() {
        glog.V(4).Infof("[cpumanager] takeByTopology: claiming CPU [%d]", c)
        if acc.needs(1) {
            acc.take(cpuset.NewCPUSet(c))
        }
        if acc.isSatisfied() {
            return acc.result, nil
        }
    }
    return cpuset.NewCPUSet(), fmt.Errorf("failed to allocate cpus")
}

Discover CPU Topology

cAdvisor obtains CPU topology by reading /proc/cpuinfo and /sys/devices/system/cpu/cpu. Relevant structures are defined in vendor/github.com/google/cadvisor/info/v1/machine.go:

type MachineInfo struct {
    NumCores int `json:"num_cores"`
    // Machine Topology
    Topology []Node `json:"topology"`
}

type Node struct {
    Id      int   `json:"node_id"`
    Memory  uint64 `json:"memory"`
    Cores   []Core  `json:"cores"`
    Caches  []Cache `json:"caches"`
}

The function GetTopology builds this information from the host.

func GetTopology(sysFs sysfs.SysFs, cpuinfo string) ([]info.Node, int, error) {
    nodes := []info.Node{}
    // ...
    return nodes, numCores, nil
}

Pod Creation Process

When a pod is created under the static policy, the following steps occur:

KubeRuntime calls the container runtime to create the container.

The container is handed to the CPU Manager.

The CPU Manager processes the container according to the static policy.

It selects the best CPU set from the shared pool based on topology.

The chosen CPUs are recorded in the checkpoint state and removed from the shared pool.

The CPU set is applied to the container via the CRI UpdateContainerResources call.

KubeRuntime finally starts the container.

Reference code:

pkg/kubelet/cm/cpumanager/cpu_manager.go

func (m *manager) AddContainer(pod *v1.Pod, container *v1.Container, containerID string) {
    m.Lock()
    defer m.Unlock()
    if cset, exists := m.state.GetCPUSet(string(pod.UID), container.Name); exists {
        m.lastUpdateState.SetCPUSet(string(pod.UID), container.Name, cset)
    }
    m.containerMap.Add(string(pod.UID), container.Name, containerID)
}

Pod Deletion Process

When a container managed by the CPU Manager is deleted, the flow is:

KubeRuntime calls the CPU Manager to handle the static policy cleanup.

The CPU set allocated to the container is returned to the shared pool.

KubeRuntime removes the container via the runtime.

The CPU Manager asynchronously updates the shared pool for other containers.

Reference code:

pkg/kubelet/cm/cpumanager/cpu_manager.go

func (m *manager) RemoveContainer(containerID string) error {
    m.Lock()
    defer m.Unlock()
    err := m.policyRemoveContainerByID(containerID)
    if err != nil {
        klog.ErrorS(err, "RemoveContainer error")
        return err
    }
    return nil
}

Solution

To recover the node after the mismatch:

Delete the existing cpu_manager_state file.

Restart kubelet.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes kubelet CPU topology CPU Manager Static Policy Node Recovery

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.