Operations 15 min read

Understanding Linux System Performance: CPU, Memory, I/O, and Monitoring Commands

This article explains how to assess Linux system performance by examining CPU usage with top, interpreting load averages, using vmstat for detailed metrics, monitoring memory consumption via top, understanding cache behavior, evaluating I/O performance with iostat and sar, and provides practical commands and visual examples for each component.

IT Services Circle
IT Services Circle
IT Services Circle
Understanding Linux System Performance: CPU, Memory, I/O, and Monitoring Commands

1. CPU

1.1 top command

The top command can be used to observe various CPU metrics. After launching top , press 1 to display per‑core details.

Key fields shown by top include:

us : user‑mode CPU usage percentage.

sy : kernel‑mode CPU usage percentage.

ni : CPU usage by processes with a high nice value.

wa : time spent waiting for I/O devices.

hi : hardware interrupt CPU usage.

si : software interrupt CPU usage.

st : time stolen by the hypervisor (relevant for VMs).

id : idle CPU percentage.

Generally, the idle percentage ( id ) is the most straightforward indicator of overall CPU utilization.

1.2 Load average

The load average represents the number of processes waiting for CPU time. top displays three values: the average over the last 1, 5, and 15 minutes.

For a single‑core system, a load of 1 means the CPU is fully utilized. For multi‑core systems, the threshold scales with the number of cores (e.g., a 4‑core machine can sustain a load of ~4 before becoming saturated).

1.3 vmstat

The vmstat command provides deeper insight into CPU and memory activity. Important columns include:

b : number of processes blocked on I/O.

cs : context switches per second.

si/so : swap‑in and swap‑out activity.

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
34  0    0 200889792  73708 591828    0    0     0     5    6   10 96  1  3  0  0
...

2. Memory

2.1 Observation commands

Memory usage can be examined with top , which shows three columns of interest:

VIRT : virtual memory size (usually large, not a primary concern).

RES : resident memory actually used by the process (the main metric for monitoring).

SHR : shared memory (e.g., shared libraries).

2.2 CPU cache

Because the speed gap between CPU cores and main memory is huge, modern CPUs employ multiple levels of cache. The diagram below illustrates typical cache hierarchy.

In Java, false sharing occurs when multiple threads modify variables that reside on the same cache line, causing unnecessary cache line invalidations. The cache‑line size can be queried with:

cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size

Cache sizes for each level can be read similarly:

# cat /sys/devices/system/cpu/cpu0/cache/index1/size
32K
# cat /sys/devices/system/cpu/cpu0/cache/index2/size
256K
# cat /sys/devices/system/cpu/cpu0/cache/index3/size
20480K

2.3 HugePage

Linux normally manages memory in 4 KB pages. When the physical memory is large, the page‑table overhead can become a bottleneck. Using larger pages (HugePages) reduces the number of entries and can improve performance, though it may increase memory fragmentation.

2.4 Pre‑touching memory

By default, the JVM allocates memory lazily. Enabling -XX:+AlwaysPreTouch forces the JVM to touch all pages at startup, reducing page‑fault latency during runtime at the cost of a slower startup.

3. I/O

3.1 Observation commands

I/O is typically the slowest subsystem. Disk performance can be examined with iostat , focusing on the %util column (values near 100 % indicate saturation) and other metrics such as await and svctm .

%util : device utilization percentage.

Device : identifier of the disk.

avgqu‑sz : average queue length (shorter is better).

await : average I/O wait time (ideal < 5 ms).

svctm : average service time per I/O operation.

3.2 Zero‑copy

Zero‑copy techniques avoid copying data between user and kernel space. For example, the sendfile system call transfers data directly from a file descriptor to a socket, eliminating an extra memory copy and reducing CPU overhead.

4. Network

Network statistics can be gathered with sar -n DEV for interface traffic and sar -n TCP,ETCP for TCP‑level metrics.

$ sar -n DEV 1
Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)
12:16:48 AM IFACE   rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
12:16:49 AM eth0    18763.00 5032.00 20686.42 478.30 0.00 0.00 0.00 0.00
...

5. End

These metrics provide a high‑level view of system health but are not sufficient alone to pinpoint performance bottlenecks. For deeper analysis, more advanced tools such as eBPF‑based BCC utilities are recommended.

Performance MonitoringI/OLinuxCPUMemorySystem Administration
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.