Why Do Kubernetes Nodes Stay NotReady for Exactly Three Minutes? A Deep Dive into PLEG
This article analyzes a recurring NotReady issue in Alibaba Cloud Kubernetes clusters, explaining how kubelet, NodeStatus, and the Pod Lifecycle Events Generator (PLEG) interact, why nodes become Ready for only three minutes after a kubelet restart, and how the underlying timeout mechanisms cause the problem.
Preface
Alibaba Cloud provides its own Kubernetes container cluster service. As the number of clusters grows, some users encounter a low‑probability "Node NotReady" condition.
Problem Phenomenon
The issue manifests as nodes entering the NotReady state. Restarting the node temporarily resolves the problem, but after about 20 days the issue reappears. Restarting the kubelet makes the node Ready for only three minutes before it becomes NotReady again.
Overall Logic
Four core components affect node readiness in a Kubernetes cluster:
etcd (the cluster's key‑value store)
API Server (cluster entry point)
Node controller
kubelet (runs on each node)
kubelet has two roles: as a controller that fetches pod specifications from the API Server and manages pod execution, and as a node‑status monitor that reports node conditions back to the API Server.
NodeStatus and PLEG
kubelet uses the NodeStatus mechanism to periodically report node health. A key factor in determining readiness is the Pod Lifecycle Events Generator (PLEG). PLEG periodically checks the state of containers on the node and generates events for the kubelet's main sync loop. If PLEG fails to complete its checks within a timeout, NodeStatus marks the node as NotReady.
PLEG Timing
By default, PLEG runs every second (interval) and each check has a three‑minute timeout. If a check does not finish within three minutes, NodeStatus treats the node as NotReady.
Three‑Minute Ready Window
After a kubelet restart, the first PLEG check often does not finish successfully, so the node remains Ready only until the three‑minute timeout expires, after which the NotReady status is reported.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
