How Linux Containers Use Namespaces and cgroups for Isolation – A Deep Dive
This article explains how Linux containers achieve lightweight virtualization through namespaces and cgroups for resource isolation, describes Docker image layering with overlayfs, and walks through container engine internals such as containerd, its shim architecture, and the start/exec workflows.
Introduction
Linux containers provide lightweight virtualization by sharing the host kernel while isolating processes using namespace and cgroup mechanisms. The article uses Docker as a concrete example to illustrate container images, engines, and runtime architecture.
Namespace and cgroup based resource isolation
Seven Linux namespaces are relevant to containers; Docker employs the first six (mount, uts, pid, network, user, ipc). The seventh, cgroup namespace, is supported by runC but not directly used by Docker. Each namespace isolates a specific aspect of the process environment:
Mount namespace isolates the filesystem view, exposing only the container image and volumes bound with -v.
UTS namespace isolates hostname and domain.
PID namespace ensures the container’s init process runs as PID 1.
Network namespace provides an independent network stack.
User namespace maps container UIDs/GIDs to host IDs.
IPC namespace isolates inter‑process communication primitives.
Cgroup namespace presents a root‑like view of cgroup hierarchies, improving safety.
Namespaces are created via the unshare system call; the article shows an example where a new PID namespace is verified with ps displaying the bash process as PID 1.
Container image storage with overlayfs
Docker images are built on a union (overlay) filesystem. Layers are stacked as read‑only lower layers and a writable upper layer. A workdir holds temporary data, and mergedir presents a unified view to the container.
Read operations fetch data from the lower layer when the upper layer is empty. Write operations trigger a copy‑up, moving files from lower to upper before modification. Deletions are represented by whiteout files or directory metadata, not by actual removal.
Container engine architecture (containerd)
The article examines containerd , a CNCF‑hosted container runtime, and its components:
GRPC interface for upper‑level services.
Storage layer handling image metadata and container state.
Task service managing container lifecycles.
Runtimes layer (e.g., runC, kata, gvisor).
Shim processes mediate between containerd and the runtime. The original containerd‑shim (v1) can be replaced by shim‑v2 implementations, allowing custom shims such as shim‑runc, shim‑gvisor, or shim‑kata. This reduces component count and simplifies integration.
Start and exec workflows
Two example flows illustrate how containerd creates and interacts with containers:
Start flow: containerd creates metadata, requests the task service, communicates with the shim via GRPC, and the shim invokes the runtime to launch a new container with fresh namespaces.
Exec flow: containerd forwards an exec request to the shim, which attaches a new process to existing namespaces; the main difference from start is that no new namespaces are created.
Both flows rely on the same shim infrastructure; the distinction lies in namespace handling.
Conclusion
After reading, readers should understand how Linux containers achieve isolation with namespaces and cgroups, how overlayfs structures Docker image storage, and how a container engine like containerd orchestrates container lifecycles through shims and GRPC interactions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
