Cloud Native 16 min read

Understanding Container CPU Utilization: Accurate Measurement Methods and the Missing Nice/IRQ/SoftIRQ Metrics

This article explains how to correctly obtain CPU utilization inside containers, compares host and container metrics, describes the use of lxcfs and cgroup files (including cgroup V1/V2) for accurate measurement, and clarifies why container statistics omit nice, irq, and softirq fields.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
Understanding Container CPU Utilization: Accurate Measurement Methods and the Missing Nice/IRQ/SoftIRQ Metrics

In this article the author examines two common questions about container CPU utilization: how to correctly obtain the CPU usage of a container and why container metrics lack the nice/irq/softirq fields that appear on a physical host.

Most users rely on top to view CPU usage, but inside a container /proc/stat reflects the host’s statistics, leading to misleading results. Two main solutions are presented:

Mount a container‑specific /proc view using tools such as lxcfs , which replaces the host’s pseudo‑files so commands like top report container‑level data.

Read the container’s cgroup files directly (e.g., /sys/fs/cgroup/cpuacct for cgroup V1 or /sys/fs/cgroup/cpu.stat for cgroup V2) and compute utilization from the reported nanosecond or microsecond counters.

When using cgroup files, the implementation differs between cgroup V1 and V2. For V1, files such as cpuacct.stat , cpuacct.usage , cpuacct.usage_user , and cpuacct.usage_sys provide raw nanosecond usage that can be sampled at two timestamps (t1, t2) and converted to a usage ratio by dividing the delta by the elapsed time and the number of CPUs.

For cgroup V2, the cpu.stat pseudo‑file reports usage_usec , user_usec , and system_usec in microseconds. The kernel obtains these values via the cpu_stat_show function, which aggregates per‑CPU statistics ( cgroup_rstat_cpu ) into a global bstat structure, converts nanoseconds to microseconds, and prints them.

//file: https://github.com/opencontainers/runc/blob/main/libcontainer/cgroups/utils.go
func IsCgroup2UnifiedMode() bool {
    isUnifiedOnce.Do(func() {
        var st unix.Statfs_t
        err := unix.Statfs(unifiedMountpoint, &st)
        // ...
        isUnified = st.Type == unix.CGROUP2_SUPER_MAGIC
    })
    return isUnified
}

The kernel’s cgroup definition ( struct cgroup ) contains per‑CPU ( cgroup_rstat_cpu ) and global ( cgroup_base_stat ) statistics. The cpu.stat handler ( cpu_stat_show ) calls cgroup_base_stat_cputime_show , which:

Locates the cgroup object for the current pseudo‑file.

Flushes per‑CPU counters into the global bstat .

Extracts utime , stime , and total runtime.

Converts nanoseconds to microseconds and outputs the values.

void cgroup_base_stat_cputime_show(struct seq_file *seq) {
    struct cgroup *cgrp = seq_css(seq)->cgroup;
    u64 usage, utime, stime;
    cgroup_rstat_flush_hold(cgrp);
    usage = cgrp->bstat.cputime.sum_exec_runtime;
    cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime, &utime, &stime);
    do_div(usage, NSEC_PER_USEC);
    do_div(utime, NSEC_PER_USEC);
    do_div(stime, NSEC_PER_USEC);
    seq_printf(seq, "usage_usec %llu\nuser_usec %llu\nsystem_usec %llu\n", usage, utime, stime);
}

The aggregation of per‑CPU data occurs in cgroup_rstat_flush_locked , which iterates over each CPU and calls cgroup_base_stat_flush to add the delta values to the global bstat . The delta is computed by comparing the current counters with the previously stored ones.

static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) {
    struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
    cputime = rstatc->bstat.cputime;
    delta.cputime.utime = cputime.utime - last_cputime->utime;
    delta.cputime.stime = cputime.stime - last_cputime->stime;
    delta.cputime.sum_exec_runtime = cputime.sum_exec_runtime - last_cputime->sum_exec_runtime;
    *last_cputime = cputime;
    cgroup_base_stat_accumulate(&cgrp->bstat, δ);
}

During each timer interrupt, the kernel’s scheduler calls update_process_times , which eventually reaches __cgroup_account_cputime_field . This function adds the tick’s execution time to the per‑CPU rstat_cpu fields, grouping user‑related ticks (user + nice) into utime and system‑related ticks (system + irq + softirq) into stime . Consequently, containers expose only two aggregated metrics: user (equivalent to host user + nice) and system (equivalent to host system + irq + softirq).

In summary, accurate container CPU utilization can be achieved either by replacing /proc/stat with lxcfs or by directly reading cgroup statistics, taking care to handle cgroup V1 vs. V2 differences. The apparent loss of nice, irq, and softirq metrics is due to the container’s aggregation of these values into the broader user and system counters.

monitoringcloud nativecontainerCPU utilizationcgroup
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.