Cloud Native 7 min read

Why Do Kubernetes Nodes Stay NotReady for Exactly Three Minutes? A Deep Dive into PLEG

This article analyzes a recurring NotReady issue in Alibaba Cloud Kubernetes clusters, explaining how kubelet, NodeStatus, and the Pod Lifecycle Events Generator (PLEG) interact, why nodes become Ready for only three minutes after a kubelet restart, and how the underlying timeout mechanisms cause the problem.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why Do Kubernetes Nodes Stay NotReady for Exactly Three Minutes? A Deep Dive into PLEG

Preface

Alibaba Cloud provides its own Kubernetes container cluster service. As the number of clusters grows, some users encounter a low‑probability "Node NotReady" condition.

Problem Phenomenon

The issue manifests as nodes entering the NotReady state. Restarting the node temporarily resolves the problem, but after about 20 days the issue reappears. Restarting the kubelet makes the node Ready for only three minutes before it becomes NotReady again.

Overall Logic

Four core components affect node readiness in a Kubernetes cluster:

etcd (the cluster's key‑value store)

API Server (cluster entry point)

Node controller

kubelet (runs on each node)

kubelet has two roles: as a controller that fetches pod specifications from the API Server and manages pod execution, and as a node‑status monitor that reports node conditions back to the API Server.

NodeStatus and PLEG

kubelet uses the NodeStatus mechanism to periodically report node health. A key factor in determining readiness is the Pod Lifecycle Events Generator (PLEG). PLEG periodically checks the state of containers on the node and generates events for the kubelet's main sync loop. If PLEG fails to complete its checks within a timeout, NodeStatus marks the node as NotReady.

PLEG Timing

By default, PLEG runs every second (interval) and each check has a three‑minute timeout. If a check does not finish within three minutes, NodeStatus treats the node as NotReady.

Three‑Minute Ready Window

After a kubelet restart, the first PLEG check often does not finish successfully, so the node remains Ready only until the three‑minute timeout expires, after which the NotReady status is reported.

cloud-nativeKubernetesAlibaba CloudKubeletPLEGNode NotReady
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.