Understanding Kubelet Components and CRI Architecture in Kubernetes
This article explains the internal components of the kubelet, details the CRI architecture, outlines the roles of the Runtime and Image services, and provides code examples of the gRPC API definitions, helping readers grasp how Kubernetes manages containers and images.
Kubelet Components
Kubelet itself operates using a controller pattern. Its internal workflow can be illustrated with the diagram below.
Kubelet Server exposes APIs for kube-apiserver, metrics-server, etc.; for example, kubectl exec uses the Kubelet API /exec/{token} to interact with containers.
Container Manager handles resources such as cgroups, QoS, cpuset, devices.
Volume Manager prepares storage volumes, formats disks, mounts them on the node, and passes the mount path to containers.
Eviction evicts low‑priority containers when resources are scarce, preserving high‑priority workloads.
cAdvisor provides container metrics.
Metrics and stats supply node and container measurement data; metrics‑server consumes /stats/summary for HPA scaling.
Generic Runtime Manager interacts with the CRI to manage containers and images.
The CRI defines interfaces for container and image services using Protocol Buffers over gRPC; the definitions reside in pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto for Kubernetes v1.10+.
CRI Architecture
Container Runtime Components in Kubernetes
The container runtime in Kubernetes can be divided into four parts:
Kubelet’s kubeGenericRuntimeManager manages communication with the CRI shim client, handling container and image lifecycle (source: pkg/kubelet/kuberuntime/kuberuntime_manager.go).
The CRI itself, which includes both client and server interfaces for the container runtime.
The CRI shim client held by kubelet, used to talk to the CRI shim server.
The CRI shim server, i.e., the concrete runtime implementation such as the built‑in dockershim ( pkg/kubelet/dockershim) or external runtimes like cri‑containerd or rktlet.
In typical deployments each host runs a CRI shim component that translates CRI requests into calls to the underlying container engine.
Implementation Details of the CRI gRPC Server
The container runtime implements a CRI gRPC server exposing RuntimeService and ImageService. The server listens on a local Unix socket while kubelet acts as a gRPC client. Both services can be combined in a single server or split into separate ones; most community runtimes use a single server.
Below is the CRI interface definition from api.proto in Kubernetes 1.20.
// Runtime service defines the public APIs for remote container runtimes
service RuntimeService {
// Version returns the runtime name, runtime version, and runtime API version.
rpc Version(VersionRequest) returns (VersionResponse) {}
// RunPodSandbox creates and starts a pod‑level sandbox. Runtimes must ensure
// the sandbox is in the ready state on success.
rpc RunPodSandbox(RunPodSandboxRequest) returns (RunPodSandboxResponse) {}
// StopPodSandbox stops any running process that is part of the sandbox and
// reclaims network resources (e.g., IP addresses) allocated to the sandbox.
// If there are any running containers in the sandbox, they must be forcibly
// terminated.
// This call is idempotent, and must not return an error if all relevant
// resources have already been reclaimed. kubelet will call StopPodSandbox
// at least once before calling RemovePodSandbox. It will also attempt to
// reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
// multiple StopPodSandbox calls are expected.
rpc StopPodSandbox(StopPodSandboxRequest) returns (StopPodSandboxResponse) {}
// RemovePodSandbox removes the sandbox. If there are any running containers
// in the sandbox, they must be forcibly terminated and removed.
// This call is idempotent, and must not return an error if the sandbox has
// already been removed.
rpc RemovePodSandbox(RemovePodSandboxRequest) returns (RemovePodSandboxResponse) {}
// PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
// present, returns an error.
rpc PodSandboxStatus(PodSandboxStatusRequest) returns (PodSandboxStatusResponse) {}
// ListPodSandbox returns a list of PodSandboxes.
rpc ListPodSandbox(ListPodSandboxRequest) returns (ListPodSandboxResponse) {}
// CreateContainer creates a new container in specified PodSandbox
rpc CreateContainer(CreateContainerRequest) returns (CreateContainerResponse) {}
// StartContainer starts the container.
rpc StartContainer(StartContainerRequest) returns (StartContainerResponse) {}
// StopContainer stops a running container with a grace period (i.e., timeout).
// This call is idempotent, and must not return an error if the container has
// already been stopped.
// The runtime must forcibly kill the container after the grace period is
// reached.
rpc StopContainer(StopContainerRequest) returns (StopContainerResponse) {}
// RemoveContainer removes the container. If the container is running, the
// container must be forcibly removed.
// This call is idempotent, and must not return an error if the container has
// already been removed.
rpc RemoveContainer(RemoveContainerRequest) returns (RemoveContainerResponse) {}
// ListContainers lists all containers by filters.
rpc ListContainers(ListContainersRequest) returns (ListContainersResponse) {}
// ContainerStatus returns status of the container. If the container is not
// present, returns an error.
rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse) {}
// UpdateContainerResources updates ContainerConfig of the container.
rpc UpdateContainerResources(UpdateContainerResourcesRequest) returns (UpdateContainerResourcesResponse) {}
// ReopenContainerLog asks runtime to reopen the stdout/stderr log file
// for the container. This is often called after the log file has been
// rotated. If the container is not running, container runtime can choose
// to either create a new log file and return nil, or return an error.
// Once it returns error, new container log file MUST NOT be created.
rpc ReopenContainerLog(ReopenContainerLogRequest) returns (ReopenContainerLogResponse) {}
// ExecSync runs a command in a container synchronously.
rpc ExecSync(ExecSyncRequest) returns (ExecSyncResponse) {}
// Exec prepares a streaming endpoint to execute a command in the container.
rpc Exec(ExecRequest) returns (ExecResponse) {}
// Attach prepares a streaming endpoint to attach to a running container.
rpc Attach(AttachRequest) returns (AttachResponse) {}
// PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
rpc PortForward(PortForwardRequest) returns (PortForwardResponse) {}
// ContainerStats returns stats of the container. If the container does not
// exist, the call returns an error.
rpc ContainerStats(ContainerStatsRequest) returns (ContainerStatsResponse) {}
// ListContainerStats returns stats of all running containers.
rpc ListContainerStats(ListContainerStatsRequest) returns (ListContainerStatsResponse) {}
// PodSandboxStats returns stats of the pod. If the pod sandbox does not
// exist, the call returns an error.
rpc PodSandboxStats(PodSandboxStatsRequest) returns (PodSandboxStatsResponse) {}
// ListPodSandboxStats returns stats of the pods matching a filter.
rpc ListPodSandboxStats(ListPodSandboxStatsRequest) returns (ListPodSandboxStatsResponse) {}
// UpdateRuntimeConfig updates the runtime configuration based on the given request.
rpc UpdateRuntimeConfig(UpdateRuntimeConfigRequest) returns (UpdateRuntimeConfigResponse) {}
// Status returns the status of the runtime.
rpc Status(StatusRequest) returns (StatusResponse) {}
}
// ImageService defines the public APIs for managing images.
service ImageService {
// ListImages lists existing images.
rpc ListImages(ListImagesRequest) returns (ListImagesResponse) {}
// ImageStatus returns the status of the image. If the image is not
// present, returns a response with ImageStatusResponse.Image set to
// nil.
rpc ImageStatus(ImageStatusRequest) returns (ImageStatusResponse) {}
// PullImage pulls an image with authentication config.
rpc PullImage(PullImageRequest) returns (PullImageResponse) {}
// RemoveImage removes the image.
// This call is idempotent, and must not return an error if the image has
// already been removed.
rpc RemoveImage(RemoveImageRequest) returns (RemoveImageResponse) {}
// ImageFSInfo returns information of the filesystem that is used to store images.
rpc ImageFsInfo(ImageFsInfoRequest) returns (ImageFsInfoResponse) {}
}RuntimeService
RuntimeService offers interfaces grouped into four categories:
PodSandbox management – abstracts a Kubernetes Pod, providing isolation, shared namespaces, and typically corresponds to a pause container or a lightweight VM.
Container management – create, start, stop, and delete containers within a specified PodSandbox.
Streaming APIs – Exec, Attach, and PortForward return URLs of a streaming server rather than direct container interaction.
Status APIs – query the API version and runtime status.
ImageService
ImageService manages images through five operations:
List images.
Pull an image to the local host.
Inspect image status.
Remove a local image.
Query image storage usage and related information.
These operations map directly to Docker API or CRI calls.
CRI Initialization
The most relevant manager is the Generic Runtime Manager, which handles the CRI shim client. Dockershim remains in the kubelet code as the most stable built‑in runtime; “remote” refers to external CRI implementations such as containerd or CRI‑O.
If kubelet selects dockershim, it initializes and starts the dockershim server (which also initializes the CNI network plugin).
When using an external runtime, a CRI shim must be installed on each host (e.g., the shim for containerd). Since containerd 1.1 the shim is integrated as a plugin, allowing direct use of
--container-runtime-endpoint=unix:///run/containerd/containerd.sock.
The kubelet then creates a CRI shim client to communicate with the shim server.
Finally, the Generic Runtime Manager is initialized; all subsequent container and image operations go through this manager’s CRI shim client.
Key kubelet startup flags related to CRI: --container-runtime: selects the runtime (docker, remote, rkt). Default is docker (dockershim). --runtime-cgroups: cgroup configuration for the runtime. --docker-endpoint: socket for Docker daemon (effective only when --container-runtime=docker). --pod-infra-container-image: image for the pause container (default k8s.gcr.io/pause:3.5). --image-pull-progress-deadline: timeout for pulling images (default 1m). --experimental-dockershim and --experimental-dockershim-root-directory: enable dockershim and set its root directory. --container-runtime-endpoint: Unix socket of the external runtime (e.g., unix:///run/containerd/containerd.sock). --image-service-endpoint: endpoint for the image service (default same as dockershim socket).
Supported CRI Backends
Since Kubernetes 1.5, CRI has been in Alpha and supports multiple runtimes such as Docker, containerd, CRI‑O, Frakti, and Pouch. The diagram below shows their relationship with kubelet.
Impact of Deprecating Docker
Normal Kubernetes users see no impact; production clusters can switch from Docker to containerd with minimal effort.
Docker‑built images remain usable because the Image Spec is runtime‑agnostic; developers can continue using Docker to build and push images.
Workloads that mount the Docker socket (Docker‑in‑Docker) are affected; alternatives include Kaniko, Img, or Buildah, or running a Docker daemon as a DaemonSet/sidecar.
Mixed clusters with both Docker and containerd nodes can coexist by using node‑label‑based scheduling.
Preview
Future posts will explore low‑level implementations such as runc and shim, and how container management APIs are exposed.
References
https://feisky.xyz/posts/kubernetes-container-runtime/ https://jimmysong.io/kubernetes-handbook/concepts/cri.html https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2221-remove-dockershim https://kubernetes.io/zh/docs/setup/production-environment/container-runtimes/ https://www.qikqiak.com/post/containerd-usage/ https://kubernetes.io/zh/blog/2020/12/02/dockershim-faq/ https://github.com/containerd https://www.zhihu.com/question/324124344
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
