Cloud Native 8 min read

Why Does a Kubernetes Node Stay Ready Only 3 Minutes After Restart?

This article examines a recurring Kubernetes node NotReady issue where nodes become ready for only three minutes after a kubelet restart, detailing the underlying PLEG mechanism, component interactions, and diagnostic steps to resolve the problem.

Linux Cloud Computing Practice
Linux Cloud Computing Practice
Linux Cloud Computing Practice
Why Does a Kubernetes Node Stay Ready Only 3 Minutes After Restart?

Anyone who follows cloud computing knows Docker and Kubernetes have risen to prominence, and major public cloud providers now offer managed Kubernetes services. Kubernetes is powerful and highly extensible, often seen as the ultimate cloud‑native solution.

The author, a senior Alibaba Cloud technical expert, has compiled a practical guide covering theory and hands‑on practice, including cluster control, scaling, and image pulling.

Preface

Alibaba Cloud provides its own Kubernetes container‑cluster product. With the rapid increase of Kubernetes deployments, some users have sporadically observed nodes entering a NotReady state.

Typically, one to two customers encounter this issue each month. After a node becomes NotReady, the cluster master cannot control the node—no new Pods can be scheduled, and real‑time information about running Pods cannot be retrieved.

Although this particular problem has been fixed in systemd, other node‑readiness issues still occur for different reasons.

Problem Phenomenon

The symptom is a node turning NotReady again after about 20 days. Restarting the node temporarily resolves it, but the issue reappears.

Restarting the kubelet makes the node Ready for only three minutes before it reverts to NotReady.

Big Logic Behind Node Readiness

Four core components affect node readiness: the etcd database, the API Server, the node controller, and the kubelet running on each node.

The kubelet acts both as a cluster controller—periodically fetching Pod specs from the API Server and managing Pod lifecycles—and as a node‑status monitor, reporting node conditions back to the API Server.

Kubelet uses the NodeStatus mechanism, which relies heavily on the Pod Lifecycle Events Generator (PLEG). PLEG periodically checks container status and reports changes as events to the kubelet’s sync loop. If PLEG fails to complete its checks within a timeout, NodeStatus marks the node as NotReady.

Three‑Minute Ready Window

After restarting kubelet, the node remains Ready for exactly three minutes before becoming NotReady again. This aligns with PLEG’s default timeout of three minutes: if a PLEG check does not finish within that period, NodeStatus reports the node as NotReady.

The official PLEG diagram shows two processes: (1) kubelet fetching Pod spec changes from the API Server and creating or terminating Pods, and (2) PLEG periodically checking container status and feeding events back to kubelet.

kubelet as controller creates/ends Pods based on API Server updates.

PLEG checks container status and reports events to kubelet.

PLEG runs every second, and each check has a three‑minute timeout. When the kubelet restarts, the first PLEG check often hangs, causing the three‑minute timeout to expire and the node to be marked NotReady.

cloud-nativeKubernetescluster managementPLEGNodeReady
Linux Cloud Computing Practice
Written by

Linux Cloud Computing Practice

Welcome to Linux Cloud Computing Practice. We offer high-quality articles on Linux, cloud computing, DevOps, networking and related topics. Dive in and start your Linux cloud computing journey!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.