Understanding CPU Load Balancing and Scheduler Domains in Linux
This article explains the concept of CPU load balancing, the hierarchical scheduler domain and group structures in a multi‑core SoC, when and how the Linux kernel performs periodic, no‑hz, and idle load‑balancing, and outlines the step‑by‑step algorithm used to migrate tasks for balanced system performance.
Load balancing aims to reduce interference between CPUs by moving tasks from heavily loaded CPUs to lighter ones, ensuring each CPU’s task queue stays balanced.
Before discussing load balancing, it is essential to understand the CPU topology on a System‑on‑Chip (SoC), which is described using scheduling domains that represent hierarchical relationships among CPUs.
In a multi‑core SoC, clusters of cores share resources such as L2 cache; each cluster forms a multi‑core (MC) scheduling domain, while the entire chip forms a higher‑level DIE scheduling domain. Balancing across clusters requires flushing the L2 cache and incurs higher overhead.
CPU scheduling domains and groups can be inspected via the device‑model file /proc/sys/kernel/sched_domain .
Sched_domain members
Member
Description
parent and child
Define the hierarchical parent‑child relationship of scheduling domains; base domains have NULL child, top domains have NULL parent.
groups
Form a circular linked list of scheduling groups; this member points to the list head.
min_interval and max_interval
Specify the range of time intervals for checking the domain’s balance status.
balance_interval
Defines the interval at which the domain performs balancing.
busy_factor
When a CPU is busy, the balancing interval is multiplied by this factor.
imbalance_pct
Water‑mark that triggers balancing when the domain’s imbalance exceeds this percentage.
level
Indicates the domain’s level within the overall hierarchy.
span_weight
Number of CPUs contained in the domain.
span
Represents the domain’s span.
Sched_group members
Member
Description
next
Points to the next group in the circular linked list of groups within the domain.
group_weight
Number of CPUs in the group.
sgc
Computational capacity information of the group.
cpumask
Mask indicating which CPUs belong to the group.
CPU topology can be examined via /sys/devices/system/cpu/cpuX/topology , showing MC and DIE domains and their constituent CPUs.
The load‑balancing software architecture consists of two main tracking components: CPU load tracking, which aggregates load across clusters, and task load tracking, which evaluates whether a task fits the current CPU’s capacity and decides how many tasks to migrate.
CPU load tracking: aggregates load per cluster to detect inter‑cluster imbalance.
Task load tracking: determines task suitability for a CPU and selects tasks for migration.
Balancing is triggered by scheduling events such as task wake‑up, task creation, or tick interrupts, prompting the kernel to assess imbalance and possibly migrate tasks.
Linux’s CFS scheduler provides three types of load balancers:
Periodic load balancer : runs on each tick, checks the system’s balance, and moves runnable tasks from the busiest domain/group/CPU to the current CPU.
No‑hz load balancer : when a busy CPU detects idle CPUs, it sends an IPI via the GIC to wake an idle CPU, which then performs balancing on behalf of all idle CPUs.
New idle load balancer : when a CPU is about to become idle, it checks if other CPUs need help and pulls tasks from busy CPUs.
The fundamental load‑balancing process starts from the base domain, finds the busiest scheduling group, selects the busiest CPU’s runqueue as the source, chooses tasks with the highest load, and migrates them to the destination CPU’s runqueue.
Refining Core Development Skills
Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.