CPU Power Consumption, Low‑Power Modes, and Core Control Framework
CPU power consumption comprises static leakage and dynamic switching energy, so modern SoCs use low‑power C‑states and core‑control isolation to shut down idle cores, with the kernel’s corectl module dynamically evaluating load and task counts each scheduler tick to decide how many CPUs to activate or deactivate, configurable via sysfs.
With the evolution of CPU architectures and process technologies, performance and energy efficiency improve, but the increase in peak power often offsets these gains, making CPU power optimization a major challenge for SOC manufacturers.
CPU power consists of two parts: static power (leakage current of MOS transistors, dependent on temperature and voltage) and dynamic power (charging/discharging of load capacitance, dependent on frequency and voltage).
Consider a SOC with two clusters, each containing four CPUs. The total power of a cluster is the sum of the cluster‑specific modules (e.g., L3 cache) and the power of its CPUs (Figure 1).
CPU low‑power modes (C‑states) are defined as follows (using a Qualcomm chip as an example):
C0 – normal active state.
C1 – WFI (wait‑for‑interrupt) mode; the core clock is stopped, dynamic power drops dramatically, and exit latency is about 40 ns.
C4 – extends C1 by also turning off core logic and L1/L2 caches; exit latency is around 500 ns.
When a CPU is idle, it enters a low‑power state where only static power remains. If a CPU enters C1, the power reduction is illustrated in Figure 2.
In a system with only CPU0 and CPU1, waking both CPUs increases total power. If CPU0 enters WFI, its dynamic power disappears; if it is powered down, both its dynamic and static power disappear, and the same applies to CPU1.
Modern SOCs typically have eight cores divided into two or three clusters (Figure 3).
Core isolation keeps high‑power CPUs idle by preventing task scheduling on them; they can be quickly re‑enabled when performance is needed. This concept underlies the “core control” mechanism.
Core control originated from Qualcomm’s mpdecision (a user‑space hot‑plug daemon). It evolved into the kernel module corectl, which offers faster monitoring, isolation via isolate (latency 50‑500 ns vs. >100 ms for hot‑plug), and tighter integration with the scheduler and frequency governor.
The corectl framework works as follows: every scheduler tick (≈4 ms) it checks cluster load; if a change is required, it wakes a dedicated kernel thread to isolate or un‑isolate CPUs. Users can modify parameters via sysfs. The module provides three main interfaces:
Scheduler calls core_ctl_check during task‑loading updates to decide on isolation.
Other kernel components can call core_ctl_set_boost to keep all cores un‑isolated.
Rich sysfs nodes allow userspace to tune thresholds and limits.
Core control’s core logic resides in the eval_need function, which determines the required number of active CPUs (need_cpus) based on two factors: CPU loading and the number of runnable tasks.
CPU loading is calculated as the sum of task load on a runqueue divided by the CPU’s maximum capacity (Figure 7). After an initial need_cpus is derived from loading, the algorithm also considers task counts (Figure 8) using parameters such as:
nrrun : total tasks needing execution in the current cluster.
nr_prev_assist : tasks that the previous cluster needs the current cluster to help with.
new_need : newly computed required CPU count.
max_nr : maximum task count observed on any CPU in the cluster.
strict_nrrun : average running tasks over a recent window, adjusted for core size.
Decision rules include:
If task count exceeds a high threshold, all CPUs in the cluster are activated.
If the previous cluster needs assistance beyond a threshold, the current cluster activates nr_prev_assist CPUs.
If the current cluster’s task count is between the threshold and new_need , one additional CPU is added.
If any CPU’s runqueue holds more than four tasks, another CPU is activated to avoid overload.
new_need must be at least the cluster’s average running task count, encouraging small cores to stay active.
The final need_cpus cannot exceed the cluster’s maximum CPU count and must respect configured min_cpus / max_cpus limits.
Sysfs nodes expose parameters such as min_cpus , max_cpus , busy_up_thresh , busy_down_thresh , and task_thresh , allowing userspace to fine‑tune when CPUs are turned on or off based on loading or task quantity.
Reference: oppo‑opensource kernel repository (https://github.com/oppo-source/kernel_msm-4.19).
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.