Master Linux Performance: From CPU Load to Memory Optimization
This comprehensive guide explains Linux performance fundamentals, covering key metrics like throughput and latency, how to interpret average load, CPU context switching, memory management, and the most effective tools and techniques for diagnosing and optimizing system performance.
Performance Optimization
High concurrency and fast response are the two core metrics of performance optimization: throughput and latency .
Application load directly impacts end‑user experience.
System resources such as utilization and saturation affect overall capacity.
The essence of a performance problem is that system resources have reached a bottleneck while request handling is still too slow to sustain more traffic. Performance analysis is about locating these bottlenecks and mitigating them.
Select metrics to evaluate application and system performance.
Set performance goals for applications and the system.
Conduct benchmark testing.
Analyze performance to pinpoint bottlenecks.
Monitor performance and set alerts.
Different problems require different Linux performance tools. Common tools and the issues they address are listed below.
Understanding "Average Load"
Average load is the average number of processes in runnable or uninterruptible states over a period of time. It does not directly correspond to CPU utilization; uninterruptible processes are those waiting in kernel‑mode for I/O.
When Is Average Load Reasonable?
In production, monitor average load trends. A significant rise suggests the need for analysis. Note that CPU‑intensive workloads raise load and CPU usage together, while I/O‑intensive workloads raise load without a proportional CPU usage increase.
CPU Context Switch (Upper Part)
A CPU context switch saves the current task’s registers and program counter, then loads the next task’s context. Types include:
Process context switch
Thread context switch
Interrupt context switch
Process Context Switch
When a user‑mode process invokes a system call, the kernel switches from user to kernel mode, saving user registers and loading kernel registers. After the call, registers are restored and execution returns to user space. This is often called a privileged mode switch.
Thread Context Switch
Switching between threads of the same process only changes private thread data and registers, consuming fewer resources than a full process switch.
Interrupt Context Switch
Interrupt context switches involve only kernel‑mode handling; they have higher priority than process switches and never occur simultaneously.
CPU Context Switch (Lower Part)
Use vmstat to view overall context‑switch and interrupt rates: vmstat 5 # output every 5 seconds Key fields:
cs : context switches per second
in : interrupts per second
r : length of the run queue (processes ready or running)
b : processes in uninterruptible sleep
To inspect per‑process switches, use pidstat -w:
pidstat -w 5Analyzing High CPU Usage
CPU usage is the percentage of time the CPU spends on non‑idle work. It can be measured with top, ps, or perf. For detailed function‑level analysis, run: perf top -g -p <PID> Identify hot functions (e.g., sqrt, add_function) and remove unnecessary code to improve throughput.
When System CPU Is High but No Process Shows High Usage
Investigate the run queue; many processes may be in the Running state without being the top‑consuming ones. Use pstree to trace parent processes of hidden workloads such as short‑lived stress commands.
Uninterruptible and Zombie Processes
Process states:
R – Running/Runnable
D – Uninterruptible (usually I/O wait)
Z – Zombie (exited but not reaped)
S – Interruptible sleep
I – Idle (kernel threads)
T – Stopped/Traced
X – Dead
Large numbers of D or Z states may indicate I/O problems or missing wait() calls.
CPU Performance Metrics
User CPU usage (processes in user space)
System CPU usage (kernel time)
I/O wait
Soft/Hard interrupt rates
Steal/Guest time (virtualized environments)
Average load (ideally equals number of logical CPUs)
Context switches (voluntary vs. involuntary)
Cache hit rate
Performance Tools Overview
Check load with uptime, then use mpstat and pidstat to locate heavy processes.
Use vmstat for context switches and interrupts, pidstat -w for per‑process switches, and pidstat -d for I/O.
For CPU‑bound issues, start with top, then drill down with perf top.
For I/O bottlenecks, examine iostat, dstat, and strace or perf on the offending process.
Memory Fundamentals
Linux provides each process with a virtual address space split into kernel and user regions. The user space consists of five segments: read‑only (code), data, heap, memory‑mapped files, and stack.
Allocation Strategies
brk() for small allocations (<128 KB) by moving the heap top.
mmap() for large allocations (>128 KB) using memory‑mapped files.
Reclaiming Memory
LRU cache eviction
Swapping out anonymous pages
OOM killer for runaway processes
Common commands to view memory usage:
free
top
ps -o pid,vsize,rss,cmdBuffers and Cache
Buffers cache raw disk blocks; Cache stores file data. Both appear as used memory but can be reclaimed when needed.
Detecting Memory Leaks
Use BCC’s memleak tool to trace allocations that are never freed:
/usr/share/bcc/tools/memleak -a -p $(pidof <process>)Swap Behavior
When physical memory is scarce, Linux swaps out anonymous pages. The aggressiveness is controlled by /proc/sys/vm/swappiness (0–100). Even with swappiness set to 0, swap may occur if free memory plus reclaimable cache falls below thresholds.
Analyzing High Swap
Start with free to confirm swap usage, then monitor with sar -r -S or cachetop. Identify processes causing heavy I/O or memory pressure, and consider adjusting swappiness or disabling swap on production nodes.
Quick Memory Diagnosis Workflow
Run free and top for a high‑level view.
Use vmstat and pidstat to spot trends.
Drill down with memleak, perf, or strace for detailed analysis.
Optimization Recommendations
Prefer disabling swap; if unavoidable, lower swappiness.
Reduce dynamic allocations via memory pools or HugePages.
Leverage caches and buffers, or external caches like Redis.
Apply cgroup limits to prevent a single process from exhausting memory.
Adjust /proc/pid/oom_adj for critical services to avoid OOM kills.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
