Fundamentals 17 min read

How Linux cgroup Enables Fair CPU Sharing with Group Scheduling

This article explains how Linux cgroup groups processes and uses group scheduling to allocate CPU, disk I/O, and other resources fairly among users, detailing the underlying task_group structures, scheduling entities, real‑time and normal process policies, and providing practical command‑line examples.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How Linux cgroup Enables Fair CPU Sharing with Group Scheduling

cgroup and Group Scheduling

The Linux kernel implements control groups (cgroup) since version 2.6.24, allowing processes to be grouped and resources to be allocated per group, e.g., one group gets 30% CPU and 50% disk I/O while another gets 10% CPU.

CPU is one of the resources that can be divided, leading to the concept of group scheduling.

Traditional Linux schedulers are process‑based. If user A runs make -j8 and user B runs a single‑threaded make, A creates many more processes and thus receives a larger share of CPU time.

Group scheduling solves this by first selecting a group, then a process within that group, ensuring each group receives a roughly equal share of CPU.

Related Data Structures

In the kernel, task_group structures represent scheduling groups, forming a tree that mirrors the cgroup directory hierarchy.

Each task_group can contain both real‑time and normal processes, requiring a set of scheduling entities and runqueues (one per CPU). Scheduling entities abstract both groups and individual tasks.

The root task_group has no scheduling entity; the scheduler always starts from its runqueue to pick the next entity, recursing down the tree until a runnable task is found.

Group Scheduling Policies

Group scheduling must define a priority for a task_group. For real‑time (RT) scheduling, the group's priority equals the highest priority of any task inside the group (lower numeric value means higher priority).

When a task enters or leaves a group, all ancestor groups are updated to reflect the new highest priority.

Two kernel parameters control RT groups: /proc/sys/kernel/sched_rt_period_us and /proc/sys/kernel/sched_rt_runtime_us. They define, per period, the maximum runtime that all RT tasks may consume. By default, RT tasks may use up to 95% of a CPU second, leaving at least 5% for normal tasks.

Each task_group can have its own rt_runtime_us and rt_period_us, limiting the group's RT CPU share. The root group inherits the system‑wide defaults.

When a group's runtime limit is reached, its runqueue is throttled; a periodic timer later restores it.

Normal Process Group Scheduling

For normal (CFS) processes, a group’s priority is expressed via a shares value, which maps to the weight used by the CFS scheduler. The default shares correspond to the default nice level, so groups and processes share CPU equally unless adjusted.

Example

Environment: Ubuntu 10.04, kernel 2.6.32, Intel Core2 dual‑core.

Mount a CPU‑only cgroup and create two sub‑groups:

sudo mkdir -p /dev/cgroup/cpu
sudo mount -t cgroup -o cpu cgroup /dev/cgroup/cpu
cd /dev/cgroup/cpu
mkdir grp_{a,b}

Assign three shells to the groups (one to grp_a, two to grp_b) using a helper script that writes the shell PID into the group's tasks file with root privileges.

Run a busy‑loop program bound to CPU 1 in each shell. The grp_a process consumes ~50% of the CPU, while the two grp_b processes each consume ~25%, demonstrating fair sharing.

For RT processes, modify the program to use SCHED_FIFO and set cpu.rt_runtime_us for grp_a to 300000 (300 ms per second). Observing top shows the RT process limited to its allotted share, even though the CPU remains idle.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxGroup SchedulingCPU sharing
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.