Cloud Native 14 min read

Why Do Kubernetes Nodes Suddenly Turn NotReady? Uncovering a Systemd Cookie Overflow Bug

This article walks through a low‑probability NotReady issue in Alibaba Cloud Kubernetes clusters, detailing how systemd’s 32‑bit cookie overflow in its dbus handling caused container runtime failures, the step‑by‑step debugging process, and the eventual upstream fix.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Why Do Kubernetes Nodes Suddenly Turn NotReady? Uncovering a Systemd Cookie Overflow Bug

Problem Overview

In Alibaba Cloud Kubernetes clusters, a small number of customers experience nodes entering the NotReady state, preventing the master from scheduling new Pods or retrieving running Pod information. The issue recurs roughly once or twice per month and can only be resolved by restarting systemd, which is risky in production.

Key Kubernetes Concepts

Kubernetes clusters consist of Master and Worker nodes. The Master runs control‑plane components (scheduler, controller‑manager), while Workers run user workloads. Each node runs a kubelet agent that communicates with the control plane and manages containers via the container runtime.

Investigating the NotReady State

First, the kubelet service was checked with systemctl status kubelet and appeared healthy. Examining its logs via journalctl -u kubelet revealed an error indicating that the container runtime was not working and that the PLEG (Pod Lifecycle Event Generator) was unhealthy.

Understanding PLEG

PLEG is a component inside kubelet that monitors the health of the container runtime (Docker daemon). It prefers an interrupt‑driven approach over pure polling to reduce overhead, but still uses both mechanisms internally.

Docker Daemon Analysis

Since PLEG reported a runtime problem, the Docker daemon was examined. Sending kill -USR1 <pid_of_docker> caused Docker to dump all thread stacks to /var/run/docker. The stacks showed a thread blocked on a mutex while handling an HTTP request, pointing to the runtime’s internal processing.

containerd Analysis

Docker 1.11+ splits into docker daemon, containerd, containerd‑shim, and runC. Using kill -SIGUSR1 <pid_of_containerd> dumped containerd’s stack to the messages log. The stack highlighted a thread that started a process via runC, indicating that runC was the next point of failure.

Dbus Investigation

runC

communicates with systemd over D‑Bus. Stracing runC showed it hanging while writing to a D‑Bus field named org.free. The busctl command was used to list all system buses; the org.freedesktop.systemd1 bus showed an unusually large Name identifier, suggesting exhaustion of a D‑Bus data structure.

Systemd Debugging

Restarting the dbus‑daemon did not fix the issue, but systemctl daemon-reexec cleared the problem, implicating systemd. Core dumps revealed that a single thread was waiting on a D‑Bus event. Live debugging with GDB attached to systemd showed the function sd_bus_message_seal returning EOPNOTSUPP because the internal cookie counter had overflowed past 0xffffffff. The overflow caused new messages to fail sealing, breaking communication between runC and systemd.

Fix

The upstream fix changes the cookie handling: both D‑Bus versions now use a 32‑bit cookie, and when the counter reaches 0xfffffff, the next value jumps to 0x80000000 with the high bit marking an overflow state. The code also checks that the next cookie is not already in use to avoid collisions.

Conclusion

The bug appears in roughly two clusters per month and requires a systemd restart to recover, posing a serious operational risk. The issue was reported to both the systemd and runC communities; a patch has been accepted by Red Hat and will be delivered via a future systemd upgrade.

cloud-nativeKubernetessystemdDBusPLEGNode NotReady
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.