Understanding Linux System Performance: CPU, Memory, I/O, and Monitoring Commands
This article explains how to assess Linux system performance by examining CPU usage with top, interpreting load averages, using vmstat for detailed metrics, monitoring memory consumption via top, understanding cache behavior, evaluating I/O performance with iostat and sar, and provides practical commands and visual examples for each component.
1. CPU
1.1 top command
The top command can be used to observe various CPU metrics. After launching top , press 1 to display per‑core details.
Key fields shown by top include:
us : user‑mode CPU usage percentage.
sy : kernel‑mode CPU usage percentage.
ni : CPU usage by processes with a high nice value.
wa : time spent waiting for I/O devices.
hi : hardware interrupt CPU usage.
si : software interrupt CPU usage.
st : time stolen by the hypervisor (relevant for VMs).
id : idle CPU percentage.
Generally, the idle percentage ( id ) is the most straightforward indicator of overall CPU utilization.
1.2 Load average
The load average represents the number of processes waiting for CPU time. top displays three values: the average over the last 1, 5, and 15 minutes.
For a single‑core system, a load of 1 means the CPU is fully utilized. For multi‑core systems, the threshold scales with the number of cores (e.g., a 4‑core machine can sustain a load of ~4 before becoming saturated).
1.3 vmstat
The vmstat command provides deeper insight into CPU and memory activity. Important columns include:
b : number of processes blocked on I/O.
cs : context switches per second.
si/so : swap‑in and swap‑out activity.
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
34 0 0 200889792 73708 591828 0 0 0 5 6 10 96 1 3 0 0
...2. Memory
2.1 Observation commands
Memory usage can be examined with top , which shows three columns of interest:
VIRT : virtual memory size (usually large, not a primary concern).
RES : resident memory actually used by the process (the main metric for monitoring).
SHR : shared memory (e.g., shared libraries).
2.2 CPU cache
Because the speed gap between CPU cores and main memory is huge, modern CPUs employ multiple levels of cache. The diagram below illustrates typical cache hierarchy.
In Java, false sharing occurs when multiple threads modify variables that reside on the same cache line, causing unnecessary cache line invalidations. The cache‑line size can be queried with:
cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_sizeCache sizes for each level can be read similarly:
# cat /sys/devices/system/cpu/cpu0/cache/index1/size
32K
# cat /sys/devices/system/cpu/cpu0/cache/index2/size
256K
# cat /sys/devices/system/cpu/cpu0/cache/index3/size
20480K2.3 HugePage
Linux normally manages memory in 4 KB pages. When the physical memory is large, the page‑table overhead can become a bottleneck. Using larger pages (HugePages) reduces the number of entries and can improve performance, though it may increase memory fragmentation.
2.4 Pre‑touching memory
By default, the JVM allocates memory lazily. Enabling -XX:+AlwaysPreTouch forces the JVM to touch all pages at startup, reducing page‑fault latency during runtime at the cost of a slower startup.
3. I/O
3.1 Observation commands
I/O is typically the slowest subsystem. Disk performance can be examined with iostat , focusing on the %util column (values near 100 % indicate saturation) and other metrics such as await and svctm .
%util : device utilization percentage.
Device : identifier of the disk.
avgqu‑sz : average queue length (shorter is better).
await : average I/O wait time (ideal < 5 ms).
svctm : average service time per I/O operation.
3.2 Zero‑copy
Zero‑copy techniques avoid copying data between user and kernel space. For example, the sendfile system call transfers data directly from a file descriptor to a socket, eliminating an extra memory copy and reducing CPU overhead.
4. Network
Network statistics can be gathered with sar -n DEV for interface traffic and sar -n TCP,ETCP for TCP‑level metrics.
$ sar -n DEV 1
Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)
12:16:48 AM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
12:16:49 AM eth0 18763.00 5032.00 20686.42 478.30 0.00 0.00 0.00 0.00
...5. End
These metrics provide a high‑level view of system health but are not sufficient alone to pinpoint performance bottlenecks. For deeper analysis, more advanced tools such as eBPF‑based BCC utilities are recommended.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.