Demystifying Kubernetes Runtime: From Docker to containerd, CRI‑O, and Secure VM‑Based Containers
This article explains the evolution and inner workings of Kubernetes container runtimes, detailing the roles of CRI, OCI, Docker, containerd, CRI‑O, and strong‑isolation solutions like Kata, gVisor, and Firecracker, and why the default dockershim remains prevalent.
Typical Kubernetes Runtime Architecture
When a kubelet creates a container, the process follows six steps:
Kubelet calls the CRI (gRPC) interface to request container creation via dockershim, acting as a CRI client. dockershim translates the request into a Docker‑daemon‑compatible command.
The Docker daemon forwards the request to containerd, which now handles container operations. containerd spawns a containerd‑shim process to act as the container’s parent, preserving state and file descriptors.
The shim invokes runC, the OCI reference implementation, to actually start the container according to the OCI RuntimeSpec.
After runC exits, the shim remains as the container’s parent, reporting status to containerd and cleaning up child processes.
Container Runtime History
Early Kubernetes used a simple model where kubelet directly called the Docker daemon, which in turn used libcontainer to run containers. Political concerns led to the creation of the Open Container Initiative (OCI) to prevent a single vendor from controlling the runtime standard. Docker contributed runC as OCI’s reference implementation.
Subsequently, projects like CoreOS’s rkt attempted to become a native Kubernetes runtime, prompting the SIG‑node team to introduce the Container Runtime Interface (CRI) in Kubernetes 1.5, allowing any runtime that implements the CRI to be plugged in via a shim.
Docker later split its responsibilities, moving container operations to containerd and keeping the Docker daemon for higher‑level orchestration. The dockershim code was embedded in kubelet, making the Docker‑based path the default in production.
OCI and CRI Specifications
OCI defines two core specifications:
ImageSpec : describes the layout of a container image as a compressed filesystem bundle.
RuntimeSpec : enumerates the lifecycle commands (create, start, stop, delete) and their expected behavior.
RunC implements these specs, enabling any OCI‑compliant tool to run containers.
CRI is a lightweight set of gRPC interfaces that expose three groups of operations:
Container management (create, start, stop, etc.).
Image management (pull, remove, list).
PodSandbox management, which creates the shared namespace environment for a pod.
Compatible Projects
Implementations that satisfy the OCI spec include runC, Kata Containers, gVisor, and the Rust‑based railcar. Projects that implement the CRI include Docker (via dockershim), containerd (with CRI‑containerd), CRI‑O, and Frakti.
The runtime stack can be viewed as three layers:
Orchestration API → Container API (CRI‑runtime) → Kernel API (OCI‑runtime)This abstraction explains the emergence of lightweight CRI‑runtimes that aim to replace Docker and of “strong‑isolation” containers that run each workload in a separate VM.
containerd and CRI‑O
In containerd 1.0, CRI support was provided by a separate process called CRI‑containerd (see image below).
containerd 1.1 integrated the CRI logic directly into the main daemon as a plugin, eliminating the extra process.
CRI‑O is a purpose‑built Kubernetes runtime that directly implements both OCI and CRI. Its shim component ( conmon) plays the same role as containerd‑shim.
While CRI‑O and a direct containerd integration are cleaner than the dockershim path, they lack extensive production validation; Docker’s dockershim remains the default for most clusters.
Strong‑Isolation Containers: Kata, gVisor, Firecracker
Kubernetes struggles with true multi‑tenant isolation because the API server is a single instance and the default OCI runtime ( runC) shares the host kernel.
VM‑based containers address the kernel‑sharing issue. Kata Containers, derived from runV and Clear Containers, run each pod inside a lightweight VM, preserving the OCI image workflow while providing hardware‑level isolation.
gVisor implements a user‑space “Sentry” process that intercepts syscalls and forwards them via KVM or ptrace, avoiding a full VM. Firecracker builds a microVM using Rust‑implemented device emulation on top of KVM, aiming for minimal overhead.
When using Kata, the CRI RunPodSandbox call creates a VM, and subsequent CreateContainer calls launch containers inside that VM, preserving the pod’s shared namespaces via the infra (pause) container.
In summary, the Kubernetes runtime ecosystem offers a spectrum from the widely‑deployed dockershim to lightweight CRI‑runtimes and VM‑based isolation solutions; the choice depends on operational requirements and the level of security isolation needed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
