How a runc 1.1.5 Bug Miswired CPU Binding and Triggered K8s Outages
A recent K8s host‑level deployment triggered massive service timeouts because runc 1.1.5 passed incorrect CPU‑binding masks to systemd, causing containers to share cores, inflating load and starving workloads, a problem uncovered with Perfetto, BPF tracing and a targeted bug‑fix upgrade.
Root Cause
During a routine systemd‑managed component update on a Kubernetes node, latency‑sensitive services that were pinned to specific CPUs began timing out. The underlying reason was that runc 1.1.5 supplied an erroneous CPU‑binding mask to systemd, which then wrote the wrong cpuset.cpus configuration to the container cgroup.
Investigation Process
2.1 Why did CPU load rise?
After reproducing the issue on a test host with 90 two‑core pinned containers, we observed that CPU utilization dropped sharply after the change, while cpu idle time rose and both process‑switch and soft‑interrupt counts fell, indicating many cores were idle.
2.2 Why did CPU usage drop?
Perfetto traces of the CFS scheduler showed that after the update, tasks on several cores were switched to idle despite the containers continuously receiving traffic. BPF monitoring revealed that different containers were being assigned the same CPU list after the update.
2.3 Who altered the cpuset?
A custom BPF program hooked cpuset_write_resmask and showed that both runc and systemd write to cpuset.cpus. runc writes the correct list from kubelet, but systemd writes an incorrect one.
2.4 Where did the wrong parameters come from?
The chain of setting cpuset.cpus is: container creation request → kubelet → containerd → containerd‑shim‑runc‑v2 → runc → systemd → cgroup. The BPF data showed that runc received the correct mask, while systemd received a malformed one, likely due to a bug in runc 1.1.5 that omitted a required bit‑order reversal.
The problematic version was identified as runc v1.1.5. The release notes for v1.1.6 contain a bug‑fix, and upgrading eliminated the erroneous CPU binding.
Technical Details of the Bug‑Fix
The fix adds a reversal of the CPU mask bits before handing the list to systemd. The relevant code snippet:
// fit cpuset parsing order in systemd
for l, r := 0, len(ret)-1; l < r; l, r = l+1, r-1 {
ret[l], ret[r] = ret[r], ret[l]
}Without this reversal, a mask like 00001111 00010000 (representing CPUs 0‑3 and 12) was interpreted by systemd as CPUs 4 and 8‑11, causing multiple containers to compete for the same cores.
Conclusion and Takeaways
Systemd received an incorrect CPU‑binding mask while runc supplied the correct one; because systemd writes first, the container initially kept the right configuration.
Running systemctl daemon-reload during deployments forces systemd to rewrite the wrong mask, overwriting the correct setting and leading to core contention, elevated load, and service timeouts.
The incident spanned K8s, containerd, runc, and systemd, highlighting the importance of version tracking, bug‑watching, and low‑level tracing tools (Perfetto, BPF) in cloud‑native operations.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
