Fundamentals 16 min read

How Linux Calculates CPU Utilization for top – Inside /proc/stat and Kernel Timers

This article explains how Linux computes CPU utilization shown by the top command, detailing the role of /proc/stat, kernel_cpustat, timer interrupts, sampling intervals, and how user, nice, system, irq, softirq, iowait, and idle times are accumulated and reported.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How Linux Calculates CPU Utilization for top – Inside /proc/stat and Kernel Timers

Background and Motivation

When monitoring a server, most people start with the top command to view overall CPU usage. However, the meaning of the columns (e.g., ni, wa) and the accuracy of the reported percentages are often unclear.

Design Challenge

Given a four‑core server running four processes, we need to design a method that mimics top 's output while providing second‑level instantaneous CPU states. Simple aggregation of process times works for long‑term averages but fails to capture the rapid fluctuations shown by top.

Two naive approaches were considered:

Summing all process execution times and dividing by total system time × 4 – accurate for long periods but too coarse for instant snapshots.

Sampling which cores are busy at each millisecond and reporting the percentage – produces values only in 25 % steps and causes severe jitter.

To combine the strengths of both, we sample at a fine granularity (e.g., 1 ms) and aggregate those instantaneous samples over a larger window (e.g., 3 seconds) to compute a smoother average.

Where top Gets Its Data

The top command reads the pseudo‑file /proc/stat. The kernel populates this file from the kernel_cpustat per‑CPU structure.

# strace top
... 
openat(AT_FDCWD, "/proc/stat", O_RDONLY) = 4
...

The file operations for /proc/stat are defined in fs/proc/stat.c:

static int __init proc_stat_init(void)
{
    proc_create("stat", 0, NULL, &proc_stat_operations);
    return 0;
}

static const struct file_operations proc_stat_operations = {
    .open  = stat_open,
    ...
};

When the file is opened, stat_open eventually calls show_stat, which iterates over every possible CPU, accumulates fields such as user, nice, system, idle, iowait, irq, and softirq, converts the nanosecond counters to jiffies, and prints them.

static int show_stat(struct seq_file *p, void *v)
{
    u64 user, nice, system, idle, iowait, irq, softirq, steal;
    for_each_possible_cpu(i) {
        struct kernel_cpustat *kcs = &kcpustat_cpu(i);
        user    += kcs->cpustat[CPUTIME_USER];
        nice    += kcs->cpustat[CPUTIME_NICE];
        system  += kcs->cpustat[CPUTIME_SYSTEM];
        idle    += get_idle_time(kcs, i);
        iowait  += get_iowait_time(kcs, i);
        irq     += kcs->cpustat[CPUTIME_IRQ];
        softirq += kcs->cpustat[CPUTIME_SOFTIRQ];
        ...
    }
    seq_put_decimal_ull(p, "cpu  ", nsec_to_clock_t(user));
    seq_put_decimal_ull(p, " ", nsec_to_clock_t(nice));
    ...
}

How the Kernel Updates Those Counters

Linux uses a periodic timer interrupt (IRQ 0) whose frequency is defined by CONFIG_HZ. On a typical system CONFIG_HZ=1000, meaning a tick every 1 ms.

# grep ^CONFIG_HZ /boot/config-5.4.56.bsk.10-amd64
CONFIG_HZ=1000

Each tick invokes update_process_times (in kernel/time/timer.c), which records the current task’s time slice:

void update_process_times(int user_tick)
{
    struct task_struct *p = current;
    account_process_tick(p, user_tick);
    ...
}

The helper account_process_tick decides whether the tick occurred in user mode, kernel mode, or idle, and forwards to the appropriate accounting function:

void account_process_tick(struct task_struct *p, int user_tick)
{
    cputime = TICK_NSEC;
    if (user_tick)
        account_user_time(p, cputime);
    else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET))
        account_system_time(p, HARDIRQ_OFFSET, cputime);
    else
        account_idle_time(cputime);
}

User‑time and Nice

account_user_time

adds the tick to either the user or nice field depending on the process’s nice value.

void account_user_time(struct task_struct *p, u64 cputime)
{
    int index = (task_nice(p) > 0) ? CPUTIME_NICE : CPUTIME_USER;
    task_group_account_field(p, index, cputime);
}

Thus the nice column in top represents CPU time consumed by processes with a positive nice value.

Kernel‑time (system, irq, softirq)

account_system_time

distinguishes between hard IRQ, soft IRQ, and regular kernel execution:

void account_system_time(struct task_struct *p, int hardirq_offset, u64 cputime)
{
    if (hardirq_count() - hardirq_offset)
        index = CPUTIME_IRQ;
    else if (in_serving_softirq())
        index = CPUTIME_SOFTIRQ;
    else
        index = CPUTIME_SYSTEM;
    account_system_index_time(p, cputime, index);
}

Idle and I/O Wait

If the CPU is neither in user nor kernel mode, the tick is added to idle. When the per‑CPU I/O‑wait counter is non‑zero, the tick is added to iowait instead.

void account_idle_time(u64 cputime)
{
    if (atomic_read(&rq->nr_iowait) > 0)
        cpustat[CPUTIME_IOWAIT] += cputime;
    else
        cpustat[CPUTIME_IDLE] += cputime;
}

Therefore iowait is essentially idle time spent waiting for I/O completion.

Putting It All Together

The kernel continuously accumulates nanosecond‑level counters for each CPU core. top reads a snapshot of these counters from /proc/stat, compares two snapshots (e.g., 3 seconds apart), and computes percentages by dividing the differences.

Because the method relies on sampling, it is not 100 % precise, but the high tick frequency (often 1 ms) and the aggregation over many samples make the reported percentages reliable for typical monitoring intervals of one second or more.

Summary

Linux’s CPU utilization statistics are generated by a timer‑driven sampling mechanism that updates per‑CPU kernel_cpustat fields on each tick. The top command reads these fields via the /proc/stat pseudo‑file, calculates deltas between two snapshots, and presents user + nice, system + irq + softirq, and idle + iowait as percentages. Understanding the underlying implementation clarifies the meaning of each column and the inherent sampling‑based accuracy limits.

CPU utilization summary diagram
CPU utilization summary diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KernelLinuxsystem-monitoringcpu-utilizationproc stattop command
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.