Tagged articles
65 articles
Page 1 of 1
Ops Community
Ops Community
May 13, 2026 · Operations

Kubernetes Node Failures: One‑Stop Guide to Diagnose and Fix Common Issues

This comprehensive guide walks Kubernetes operators through a step‑by‑step process for diagnosing node health problems—such as NotReady, MemoryPressure, DiskPressure, PIDPressure, and NetworkUnavailable—by examining node conditions, reviewing events, checking system resources, inspecting component logs, applying targeted fixes, and verifying recovery, all illustrated with real‑world commands and examples.

CNIDiskPressureKubernetes
0 likes · 44 min read
Kubernetes Node Failures: One‑Stop Guide to Diagnose and Fix Common Issues
MaGe Linux Operations
MaGe Linux Operations
May 3, 2026 · Cloud Native

How to Troubleshoot Kubernetes NotReady Nodes: A Complete Step‑by‑Step Guide

This article walks Kubernetes operators through a systematic investigation of NotReady node symptoms, explaining the kubelet status mechanism, detailing each diagnostic step—from verifying node conditions with kubectl to checking kubelet, container runtime, resources, network, and certificates—and providing concrete remediation and preventive measures.

KubernetesNotReadycontainerd
0 likes · 35 min read
How to Troubleshoot Kubernetes NotReady Nodes: A Complete Step‑by‑Step Guide
Code Wrench
Code Wrench
Nov 19, 2025 · Cloud Native

Unveiling Kubelet: How Kubernetes Brings Pods to Life with Go Concurrency

This article dissects the Kubelet component of Kubernetes, detailing its Go‑based architecture, core responsibilities, event‑driven syncLoop, PodWorkers concurrency model, syncPod creation flow, PLEG health monitoring, and provides practical debugging commands for production environments.

Cloud NativeDebuggingGo
0 likes · 14 min read
Unveiling Kubelet: How Kubernetes Brings Pods to Life with Go Concurrency
Code Wrench
Code Wrench
Nov 17, 2025 · Cloud Native

Unlock Kubernetes Secrets: A Go Source Dive into Its Core Architecture

This article walks readers through Kubernetes’s fundamental architecture by dissecting its Go source code, explaining key concepts such as the API server, controllers, informers, the control loop, Kubelet, and extensibility mechanisms like CRDs and admission webhooks, complete with illustrative diagrams and code snippets.

CRDCloud NativeController
0 likes · 11 min read
Unlock Kubernetes Secrets: A Go Source Dive into Its Core Architecture
Ops Development Stories
Ops Development Stories
Jul 25, 2025 · Cloud Native

How Kubernetes 1.33 Enables In‑Place Pod Resizing Without Restarts

Kubernetes 1.33 introduces in‑place vertical pod resizing, allowing administrators to adjust CPU and memory resources on running containers without restarting pods, reducing downtime for stateful workloads, improving cost efficiency, and integrating with VPA, while outlining implementation details, supported runtimes, limitations, and practical demos.

In‑Place Vertical ScalingKubernetesPod Resizing
0 likes · 18 min read
How Kubernetes 1.33 Enables In‑Place Pod Resizing Without Restarts
IT Xianyu
IT Xianyu
Jun 6, 2025 · Cloud Native

Master Kubernetes on AlmaLinux: Step‑by‑Step Setup with Containerd, kubeadm, and More

This guide walks you through preparing three AlmaLinux servers, disabling firewalls and SELinux, installing Containerd as the CRI, adding Kubernetes repositories, installing kubeadm, kubelet and kubectl, configuring the runtime, and verifying each component so you can confidently bootstrap a production‑ready Kubernetes cluster.

AlmaLinuxKubernetescontainerd
0 likes · 21 min read
Master Kubernetes on AlmaLinux: Step‑by‑Step Setup with Containerd, kubeadm, and More
System Architect Go
System Architect Go
Oct 28, 2024 · Cloud Native

How Kubernetes Manages Container Images on Nodes

This article explains how Kubernetes, through the Kubelet and CRI components such as containerd and cri‑o, pulls container images, stores them on the node, and performs periodic garbage collection based on configurable age and disk‑usage thresholds.

Image Garbage CollectionKubernetescontainer-runtime
0 likes · 6 min read
How Kubernetes Manages Container Images on Nodes
Infra Learning Club
Infra Learning Club
Sep 27, 2024 · Cloud Native

Inside Kubelet: How Pod Admission Works

This article dissects Kubelet's Pod admission pipeline, explaining how syncLoopIteration gathers pod data, how HandlePodAdditions invokes canAdmitPod, and how six registered admit handlers—Eviction, System Allowlist, Resource Allocation, Predicate, AppArmor, and Shutdown—evaluate each pod with concrete code examples and decision logic.

Admission HandlersGoKubernetes
0 likes · 14 min read
Inside Kubelet: How Pod Admission Works
System Architect Go
System Architect Go
Sep 7, 2024 · Cloud Native

How Kubelet, CRI, and CNI Collaborate to Launch a New Pod

When a new Pod is created, Kubelet coordinates with the CRI and CNI components to set up the sandbox, configure networking, pull images, create and start containers, using gRPC calls and command‑line interactions, with details varying across container runtimes such as containerd, cri‑o, and Docker.

CNICRICloud Native
0 likes · 5 min read
How Kubelet, CRI, and CNI Collaborate to Launch a New Pod
Infra Learning Club
Infra Learning Club
Sep 5, 2024 · Cloud Native

Deep Dive into Kubelet’s DeviceManager Source Code

This article explains how Kubernetes uses the device‑plugin framework to extend resources beyond CPU and memory, details the kubelet registration and allocation workflow, and walks through the relevant source code in pkg/kubelet/cm/devicemanager that builds the OCI spec.

CDIDRADevice Plugin
0 likes · 5 min read
Deep Dive into Kubelet’s DeviceManager Source Code
Infra Learning Club
Infra Learning Club
Sep 3, 2024 · Cloud Native

How Kubelet’s VolumeManager Orchestrates Async Volume Attach, Mount, and Unmount

The article dissects Kubelet’s VolumeManager, detailing its asynchronous loops, the VolumeManager interface, how it is started from Kubelet.Run, the handling of Attach/Mount and Unmount operations during pod sync, the internal struct fields, and the plugin initialization process that together manage the full lifecycle of pod volumes.

GoKubernetesPod Lifecycle
0 likes · 10 min read
How Kubelet’s VolumeManager Orchestrates Async Volume Attach, Mount, and Unmount
Infra Learning Club
Infra Learning Club
Aug 30, 2024 · Cloud Native

Kubelet Source Dive: syncLoopIteration (Part 3) – How probeCh Is Built from Probe Managers

The article explains that the apparent probeCh in kubelet is actually three separate channels—livenessCh, readinessCh, and startupCh—managed by livenessManager, readinessManager, and startupManager, details the ProbeManager implementation that creates probe workers via AddPod, and shows how syncLoopIteration processes probe updates to adjust pod status.

GoKubernetescloud-native
0 likes · 8 min read
Kubelet Source Dive: syncLoopIteration (Part 3) – How probeCh Is Built from Probe Managers
Infra Learning Club
Infra Learning Club
Aug 27, 2024 · Cloud Native

Kubelet Source Code Deep Dive: Understanding Its Core Workflows

The article dissects the kubelet architecture, detailing its main syncLoop control cycle, auxiliary loops, and key managers such as podManager, podWorkers, evictionManager, probeManager, and runtime components, while explaining how pod updates, PLEG mechanisms, and various channels coordinate pod lifecycle and resource handling.

Cloud NativeKubernetesRuntime
0 likes · 9 min read
Kubelet Source Code Deep Dive: Understanding Its Core Workflows
Ops Development & AI Practice
Ops Development & AI Practice
Apr 13, 2024 · Cloud Native

Decoding Kubelet: An Object‑Oriented View of Kubernetes Node Agents

This article examines Kubernetes’ Kubelet component through an object‑oriented lens, detailing its role, key responsibilities, abstract properties and methods, and illustrating implementation steps such as resource checks and pod scheduling, to show how OO abstraction clarifies complex system behavior.

Cloud NativeKubernetesObject-Oriented Design
0 likes · 5 min read
Decoding Kubelet: An Object‑Oriented View of Kubernetes Node Agents
Open Source Linux
Open Source Linux
Mar 7, 2024 · Operations

How to Fix Disk‑Full Issues in Legacy Kubernetes Clusters Using Docker

This guide explains why old Kubernetes clusters that use Docker can run out of disk space, describes the symptoms such as pods stuck in ContainerCreating, and provides step‑by‑step commands to clean Docker files, prune images, adjust kubelet settings, and prevent future disk‑full problems.

Disk CleanupGarbage CollectionOperations
0 likes · 11 min read
How to Fix Disk‑Full Issues in Legacy Kubernetes Clusters Using Docker
Liangxu Linux
Liangxu Linux
Feb 19, 2024 · Cloud Native

How CoreDNS and kubelet Configure /etc/resolv.conf in Kubernetes Pods

This article explains how CoreDNS runs on a Caddy‑based HTTP/2 server in Kubernetes, how kubelet injects the cluster DNS IP into each container’s /etc/resolv.conf, and how different dnsPolicy settings (Default, ClusterFirst, ClusterFirstWithHostNet, None) affect the resolv.conf configuration, including key options and examples.

CoreDNSKubernetesdnsPolicy
0 likes · 6 min read
How CoreDNS and kubelet Configure /etc/resolv.conf in Kubernetes Pods
System Architect Go
System Architect Go
Dec 23, 2023 · Cloud Native

What Happens Inside Kubernetes When You Create a Deployment?

This article walks through the complete Kubernetes workflow from a user‑submitted Deployment request to the creation and scheduling of the resulting Pod, detailing the roles of the control‑plane components, node services, admission webhooks, and the various plugins involved.

Cloud NativeControl PlaneDeployment
0 likes · 7 min read
What Happens Inside Kubernetes When You Create a Deployment?
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Dec 23, 2023 · Cloud Native

Essential Kubernetes Security Practices to Safeguard Production Clusters

Learn the critical Kubernetes security measures for production environments, including RBAC access control, network policies, secret management, continuous monitoring, patch updates, API server hardening, Kubelet protection, pod security policies, and container hardening techniques, each illustrated with practical YAML examples and command snippets.

ContainerHardeningKubernetesNetworkPolicy
0 likes · 10 min read
Essential Kubernetes Security Practices to Safeguard Production Clusters
Efficient Ops
Efficient Ops
Dec 4, 2023 · Cloud Native

How Does a Kubernetes Pod Get Created? Step‑by‑Step Walkthrough

This article walks through the complete Kubernetes pod creation workflow, from submitting the YAML with kubectl to the API server, storing the definition in etcd, scheduling, kubelet orchestration, container runtime delegation, CNI networking, health probing, and endpoint setup for services.

CNIKubernetesPod Lifecycle
0 likes · 3 min read
How Does a Kubernetes Pod Get Created? Step‑by‑Step Walkthrough
Open Source Linux
Open Source Linux
Jul 20, 2023 · Cloud Native

How to Retrieve Crash Logs of a Restarted Pod Using kubectl --previous

When a pod crashes and continuously restarts, standard kubelet logs may miss the previous container's output, but using kubectl logs with the --previous flag lets you access the logs of the last terminated instance, as explained with commands, file locations, and practical verification steps.

Kuberneteskubectlkubelet
0 likes · 7 min read
How to Retrieve Crash Logs of a Restarted Pod Using kubectl --previous
Alibaba Cloud Native
Alibaba Cloud Native
Mar 10, 2023 · Cloud Native

Uncovering the Root Causes of ACK Cluster Network Latency: kubelet, softirq, and cgroup Insights

A detailed post‑mortem explains how excessive cgroup files, kubelet's sys‑CPU usage, soft‑interrupt scheduling delays, and a buggy page‑free routine caused intermittent hundreds‑of‑milliseconds network latency in an Alibaba Cloud ACK cluster, and how targeted CPU binding and kernel patches resolved the issue.

Cloud NativeKernelKubernetes
0 likes · 14 min read
Uncovering the Root Causes of ACK Cluster Network Latency: kubelet, softirq, and cgroup Insights
Open Source Linux
Open Source Linux
Jun 16, 2022 · Cloud Native

Mastering Kubernetes Control Plane: etcd, API Server, Scheduler, and Nodes

This article explains the key Kubernetes control‑plane components—including etcd, the API Server, Controller Manager, Scheduler, as well as worker‑node components like Kubelet, kube‑proxy, and the container runtime—detailing their roles, interactions, and the underlying mechanisms such as Raft consensus and admission control.

API ServerControl PlaneKubernetes
0 likes · 10 min read
Mastering Kubernetes Control Plane: etcd, API Server, Scheduler, and Nodes
Open Source Linux
Open Source Linux
May 12, 2022 · Cloud Native

Mastering Kubernetes Control Plane: etcd, API Server, Scheduler & More

This article explains the core components of the Kubernetes control plane—including etcd, the API Server, Controller Manager, Scheduler—as well as key worker‑node components like Kubelet, kube‑proxy, and the container runtime, detailing their roles, interactions, and essential functions.

API ServerControl PlaneKubernetes
0 likes · 11 min read
Mastering Kubernetes Control Plane: etcd, API Server, Scheduler & More
Architecture Digest
Architecture Digest
Apr 25, 2022 · Cloud Native

Kubernetes Architecture Overview and Detailed Components

This article explains the goals, design principles, and detailed components of Kubernetes architecture, covering its control plane, API server, etcd store, scheduler, kubelet, container runtime, and kube-proxy, and summarizes how these parts work together to provide a scalable, portable, and automated container orchestration platform.

Control PlaneKubernetescontainer orchestration
0 likes · 12 min read
Kubernetes Architecture Overview and Detailed Components
Efficient Ops
Efficient Ops
Mar 30, 2022 · Cloud Native

How to Fix Common Kubernetes Memory Leaks and Certificate Expiration Issues

This article walks through diagnosing and resolving two frequent Kubernetes problems—memory‑leak errors that cause "cannot allocate memory" or "no space left on device" messages, and expired cluster certificates—by checking cgroup stats, recompiling runc and kubelet, and renewing certificates with kubeadm for long‑term validity.

Kubernetescertificate-renewalkubeadm
0 likes · 12 min read
How to Fix Common Kubernetes Memory Leaks and Certificate Expiration Issues
Alibaba Cloud Native
Alibaba Cloud Native
Feb 14, 2022 · Cloud Native

How to Overcome CPU Throttling and NUMA Bottlenecks in Cloud‑Native Containers

This article explains why container workloads suffer from CPU throttling and NUMA‑related performance loss in cloud‑native environments, examines Kubelet's CPU allocation policies, demonstrates the impact of CPU bursts and topology‑aware scheduling, and shows how Alibaba Cloud ACK mitigates these issues with concrete data.

Alibaba Cloud ACKCPU BurstCPU throttling
0 likes · 11 min read
How to Overcome CPU Throttling and NUMA Bottlenecks in Cloud‑Native Containers
Efficient Ops
Efficient Ops
Feb 8, 2022 · Information Security

Kubelet Misconfiguration Triggered a Mining Attack – What We Learned

After discovering a compromised node in our self‑built Kubernetes cluster that was being used for Monero mining, we traced the breach to empty iptables rules and a misconfigured kubelet allowing anonymous API access, then outlined firewall hardening, network isolation, and secure kubelet practices to prevent future intrusions.

Mining AttackSecurityfirewall
0 likes · 6 min read
Kubelet Misconfiguration Triggered a Mining Attack – What We Learned
Sohu Tech Products
Sohu Tech Products
Dec 22, 2021 · Cloud Native

Zero‑Downtime Upgrade of Large‑Scale Kubernetes Clusters from v1.10 to v1.17

This article details the challenges, strategies, and step‑by‑step procedures for upgrading a 1,000‑node Kubernetes cluster from version 1.10 to 1.17 without service interruption, covering compatibility checks, in‑place versus replacement upgrades, container‑restart avoidance, pod eviction handling, and TCP connection issues.

CNCFCluster UpgradeKubernetes
0 likes · 22 min read
Zero‑Downtime Upgrade of Large‑Scale Kubernetes Clusters from v1.10 to v1.17
vivo Internet Technology
vivo Internet Technology
Dec 16, 2021 · Cloud Native

vivo Kubernetes Cluster Zero-Downtime Upgrade from v1.10 to v1.17: Practices and Solutions

Vivo’s internet team performed a zero‑downtime, in‑place upgrade of a 1,000‑node Kubernetes cluster from v1.10 to v1.17 by analyzing changelogs, backporting fixes, adjusting kubelet hash validation, adding tolerations, ensuring node labels, and using staged binary rollout, completing the process in roughly ten minutes.

Cloud NativeCluster UpgradeK8s migration
0 likes · 19 min read
vivo Kubernetes Cluster Zero-Downtime Upgrade from v1.10 to v1.17: Practices and Solutions
Efficient Ops
Efficient Ops
Nov 26, 2021 · Information Security

How a Misconfigured Kubelet Led to a Crypto‑Mining Breach and What to Do

A self‑built Kubernetes cluster suffered a crypto‑mining intrusion due to empty iptables and a misconfigured kubelet, prompting a detailed post‑mortem that outlines the symptoms, root‑cause analysis, and practical hardening steps to protect similar environments.

crypto miningfirewallincident response
0 likes · 5 min read
How a Misconfigured Kubelet Led to a Crypto‑Mining Breach and What to Do
Java High-Performance Architecture
Java High-Performance Architecture
Oct 20, 2021 · Information Security

How a Misconfigured Kubelet Led to Crypto Mining on Our Kubernetes Node – Lessons Learned

After discovering a suspicious process on one of our self‑built Kubernetes nodes, we traced the intrusion to a misconfigured kubelet that exposed the API, allowing attackers to run a Monero mining script, and we outline the investigation steps and hardening measures to prevent similar breaches.

KubernetesSecuritycrypto mining
0 likes · 6 min read
How a Misconfigured Kubelet Led to Crypto Mining on Our Kubernetes Node – Lessons Learned
Liangxu Linux
Liangxu Linux
Aug 22, 2021 · Cloud Native

Inside Kubernetes: What Happens When You Run `kubectl run nginx`?

This article walks through the complete internal journey of a `kubectl run nginx --image=nginx --replicas=3` command, detailing how the request is validated, authenticated, authorized, processed by the API server, passed through initializers, scheduled, and finally materialized as running pods by kubelet, with code excerpts from Kubernetes v1.21.

CNICRIKubernetes
0 likes · 62 min read
Inside Kubernetes: What Happens When You Run `kubectl run nginx`?
Top Architect
Top Architect
Oct 19, 2020 · Cloud Native

Step-by-Step Guide to Installing Kubernetes v1.16.0 on CentOS 7 with Docker and Flannel

This article provides a detailed, step‑by‑step tutorial for installing Kubernetes v1.16.0 on CentOS 7 virtual machines, covering Docker‑CE installation, prerequisite system configuration, master and node setup, flannel network plugin deployment, and includes all necessary command‑line snippets and the full kube‑flannel.yml manifest.

DockerFlannelKubernetes
0 likes · 20 min read
Step-by-Step Guide to Installing Kubernetes v1.16.0 on CentOS 7 with Docker and Flannel
21CTO
21CTO
May 19, 2020 · Cloud Native

Step‑by‑Step Guide: Build a Kubernetes Development Environment on CentOS

This tutorial walks you through setting up a complete Kubernetes development environment on CentOS, covering prerequisite installations, Docker CE setup, kubelet/kubeadm/kubectl configuration, master node initialization, network add‑ons, node joining, and common troubleshooting tips.

CentOSCloud NativeKubernetes
0 likes · 14 min read
Step‑by‑Step Guide: Build a Kubernetes Development Environment on CentOS
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 17, 2019 · Cloud Native

Mastering Kubernetes Node Allocatable: Reserve Resources to Prevent Cluster Failures

Learn how Kubernetes distinguishes compressible (CPU) and non‑compressible (memory, storage) resources, why default kubelet settings can cause resource contention, and how to use the Node Allocatable feature—configuring kube‑reserved, system‑reserved, and eviction thresholds—to safely reserve resources for system daemons and avoid cluster instability.

KubernetesNode Allocatablecgroups
0 likes · 9 min read
Mastering Kubernetes Node Allocatable: Reserve Resources to Prevent Cluster Failures
Tencent Cloud Developer
Tencent Cloud Developer
Apr 22, 2019 · Cloud Native

In-Place Container Upgrade in Kubernetes: Mechanism and Implementation

Kubernetes performs in‑place container upgrades by detecting spec changes, computing pod actions that selectively kill and restart only the affected containers while preserving the pod sandbox, allowing sidecar updates without full pod recreation, reducing downtime and enabling custom operators for gray‑scale rolling upgrades.

Cloud NativeContainer UpgradeKubernetes
0 likes · 16 min read
In-Place Container Upgrade in Kubernetes: Mechanism and Implementation
UCloud Tech
UCloud Tech
Apr 11, 2019 · Cloud Native

Why Does a Kubernetes Pod IP Disappear? The Hidden Second Sandbox Bug

UK8S’s custom CNI plugin integrates VPC networking to give containers native cloud performance, but a bug caused kubelet to create a second sandbox container, leading to missing NETNS parameters and VPC IP leaks; the article details the investigation, root‑cause analysis, and the patch fixing the issue.

CNIIP leakKubernetes
0 likes · 15 min read
Why Does a Kubernetes Pod IP Disappear? The Hidden Second Sandbox Bug