Fundamental Overview of Linux cgroup Architecture and Initialization (Kernel 5.10)
The article explains Linux cgroup architecture and initialization in kernel 5.10, covering its hierarchical composition, key data structures like css_set, the two‑phase boot‑time setup, creation of cgroups, and task assignment mechanisms for both cgroup v1 and v2.
A new concept called cgroup was proposed by Google engineers Paul Menage and Rohit Seth. As hardware capabilities grew, the need to improve machine utilization led to the ability to run different workloads on the same machine. Originally named "process container", it was renamed to Control Groups in 2007 and merged into the Linux 2.6.24 kernel. cgroup can limit, record, and isolate the physical resources (CPU, memory, I/O, etc.) used by a group of processes and control their priority. This article provides a basic introduction to cgroup based on the Linux 5.10 source code.
1. Composition of cgroup
Using cgroup to manage resources for a process group involves several components, including hierarchies, subsystems, tasks, and cgroups themselves.
cgroup v1 supports multiple hierarchical levels. When subsystems are attached to the same hierarchy, their resource controls cannot be decoupled, which may cause interference between processes. Therefore, v1 uses a forest‑like structure where each hierarchy is a tree.
The relationships and basic rules among cgroup, task, subsystem, and hierarchy are:
One hierarchy can attach one or more subsystems.
A subsystem can be attached to multiple hierarchies only if those hierarchies contain that subsystem exclusively.
For each created hierarchy, a task can belong to only one cgroup within that hierarchy, but the same task may belong to different cgroups in different hierarchies.
When a task forks, the child task inherits the parent’s cgroup, although it can later be moved to another cgroup.
1.1 Subsystem Introduction
Subsystems implement the actual resource control. Commonly used subsystems include cpu, memory, blkio, etc.
cgroup extracts these subsystems and attaches hooks to the underlying resource management modules to enforce limits and priorities.
1.2 Key Data Structures
The mapping between tasks and cgroups is many‑to‑many. Linux uses two intermediate structures, css_set and cgrp_cset_link , to efficiently record this relationship.
When a process clones a child without specifying a target cgroup, the child stays in the same css_set as the parent. Each css_set may contain multiple tasks and is organized in a hash table for fast lookup.
The cgrp_cset_link structure further links css_set and cgroup objects, greatly improving lookup efficiency.
Key members of css_set , cgroup , and cgroup_subsys_state are illustrated in the following diagrams:
2. cgroup Initialization
cgroup initialization consists of two phases: cgroup_init_early and cgroup_init .
2.1 cgroup_init_early
The kernel creates cgrp_dfl_root as the default hierarchy root and an init_css_set for tasks created during early boot. All child processes of the init task are attached to this set.
If subsystems need to be available during early boot, their data structures are built and linked, and online_css is called to activate them (though they are not yet exposed in the filesystem).
2.2 cgroup_init
After VFS and sysfs are ready, cgroup_setup_root creates the default hierarchy under /sys/fs/cgroup and populates the root cgroup files.
All defined subsystems are initialized (as described in section 2.1).
The init_css_set is attached to cgrp_dfl_root , making its tasks visible via the root cgroup.
For each subsystem, the user‑visible files are created.
The hash table for css_set is rebuilt to reflect the new subsystem state.
The cgroup and cgroup2 filesystem types are registered.
A /proc/cgroups file is created; on a fresh Ubuntu boot it shows one hierarchy with 179 cgroups and 14 enabled subsystems.
3. cgroup Creation and Task Assignment
cgroup VFS is built on kernfs. The mount process creates a superblock and root directory. cgroup v1 and v2 differ: v2 supports a unified hierarchy and thread mode.
3.1 cgroup Creation
After a successful mount, a new cgroup can be created with mkdir in the mounted directory.
The kernel locates the parent cgroup, checks limits (number of descendants, depth), and calls cgroup_create to allocate the object, set reference counts, inherit freeze state, and link it into the parent’s children list.
css_populate_dir creates the generic files for the new cgroup.
For each enabled subsystem, subsystem‑specific files (e.g., dfl_cftypes or legacy_cftypes ) are created based on whether the hierarchy is the default one.
3.2 Assigning Tasks to a cgroup
In thread mode, only domain cgroups can be created, but a cgroup can be switched to threaded by writing "threaded" to its cgroup.type file. Once threaded, its nearest domain parent becomes the accounting entity ( dom_cgrp ). The parent’s type changes to "domain threaded" and reverts if all children are removed.
Both cgroup v1 and v2 expose control files ( cgroup.procs , tasks , cgroup.threads ) that accept task PIDs to move tasks into the target cgroup.
Typical flow for moving a task via cgroup.procs (v2):
Obtain the destination cgroup from user input.
Find the task’s current cgroup using cset_group_from_root .
Call cgroup_attach_task which invokes cgroup_migrate_add_src to record source cset structures.
cgroup_migrate_prepare_dst determines the destination cset and prepares any subsystem‑specific attach callbacks.
cgroup_migrate_add_task links the task’s cg_list into the destination cset .
cgroup_migrate_execute performs the actual migration, calling can_attach checks and finally executing rcu_assign_pointer(task->cgroups, to) to complete the move.
4. Summary
This article described the fundamental concepts and data structures of Linux cgroups. Because the structures are relatively complex, the focus was on the core components; readers are encouraged to consult the kernel source for deeper understanding. Detailed coverage of individual subsystems was omitted.
References
【1】https://lwn.net/Articles/199643/
【2】Documentation/cgroup-v1/*
【3】Documentation/cgroup-v2.txt
【4】Kernel-5.10 source code
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.