Kubernetes Node Lifecycle: Registration, Heartbeat, and Health Monitoring
This article explains the core Kubernetes Node lifecycle—including registration, status updates, lease management, heartbeat mechanisms, and health monitoring—detailing how kubelet interacts with the API server, configuration options, and node-controller behavior for handling failures and graceful node shutdown.
Node is one of the core components of Kubernetes, and its lifecycle can be briefly summarized as registration, running, and termination. This article introduces the key events that occur during the Node lifecycle.
Node Registration
Each node must run a kubelet . After the kubelet starts, it sends a registration request to the kube-apiserver , creating a new node resource object.
The registerNode field (or the command‑line flag --register-node ) in the kubelet configuration defaults to true , controlling automatic node registration. Set it to false if you want to manage registration manually.
The node name ( nodename ) is determined by:
If a cloud provider is configured, the name is supplied by the cloud provider.
Otherwise, the machine’s hostname is used, which can be overridden with the kubelet’s --hostname-override option.
Registering a node essentially creates a new node resource; the kubelet then collects node status information and submits it. Re‑submitting the registration has no adverse effect.
Node Heartbeat Mechanism
The node heartbeat consists of two parts: updating the .status field and updating the corresponding lease object.
The kubelet’s nodeStatusUpdateFrequency (or the flag --node-status-update-frequency ) defaults to 10 seconds. When the node status changes or the interval elapses, the kubelet sends a request to the API server to update the .status information.
Each node maintains a lease object with the same name in the kube-node-lease namespace. The update interval is 0.25 × nodeLeaseDurationSeconds (default 40 s), i.e., 10 seconds.
Node Health Monitoring
The controller-manager ’s node-controller (more precisely, the node-lifecycle-controller ) monitors node health. If the node’s heartbeat is missing for longer than the --node-monitor-grace-period (default 40 s, changing to 50 s in v1.32), the controller marks the node as Unknown and applies a Taint to prevent new pods from being scheduled.
If the node remains unresponsive for an additional five minutes, the controller begins evicting pods and other resources from the node via the API server.
A normal node shutdown follows a similar process: the node is tainted, pods are rescheduled, and the node is gracefully removed.
(I am Lingxu, follow me for ad‑free technical content, no sensationalism, feel free to discuss.)
References:
https://kubernetes.io/docs/concepts/architecture/nodes/
https://kubernetes.io/docs/reference/node/node-status/
https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
https://github.com/kubernetes/kubernetes/pull/126287
System Architect Go
Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.