Operations 44 min read

Master Linux Performance: From CPU Load to Memory Optimization

This comprehensive guide explains Linux performance fundamentals, covering key metrics like throughput and latency, how to interpret average load, CPU context switching, memory management, and the most effective tools and techniques for diagnosing and optimizing system performance.

Ops Development Stories

Mar 7, 2022

Master Linux Performance: From CPU Load to Memory Optimization

Performance Optimization

High concurrency and fast response are the two core metrics of performance optimization: throughput and latency .

Application load directly impacts end‑user experience.

System resources such as utilization and saturation affect overall capacity.

The essence of a performance problem is that system resources have reached a bottleneck while request handling is still too slow to sustain more traffic. Performance analysis is about locating these bottlenecks and mitigating them.

Select metrics to evaluate application and system performance.

Set performance goals for applications and the system.

Conduct benchmark testing.

Analyze performance to pinpoint bottlenecks.

Monitor performance and set alerts.

Different problems require different Linux performance tools. Common tools and the issues they address are listed below.

Understanding "Average Load"

Average load is the average number of processes in runnable or uninterruptible states over a period of time. It does not directly correspond to CPU utilization; uninterruptible processes are those waiting in kernel‑mode for I/O.

When Is Average Load Reasonable?

In production, monitor average load trends. A significant rise suggests the need for analysis. Note that CPU‑intensive workloads raise load and CPU usage together, while I/O‑intensive workloads raise load without a proportional CPU usage increase.

CPU Context Switch (Upper Part)

A CPU context switch saves the current task’s registers and program counter, then loads the next task’s context. Types include:

Process context switch

Thread context switch

Interrupt context switch

Process Context Switch

When a user‑mode process invokes a system call, the kernel switches from user to kernel mode, saving user registers and loading kernel registers. After the call, registers are restored and execution returns to user space. This is often called a privileged mode switch.

Thread Context Switch

Switching between threads of the same process only changes private thread data and registers, consuming fewer resources than a full process switch.

Interrupt Context Switch

Interrupt context switches involve only kernel‑mode handling; they have higher priority than process switches and never occur simultaneously.

CPU Context Switch (Lower Part)

Use vmstat to view overall context‑switch and interrupt rates: vmstat 5 # output every 5 seconds Key fields:

cs : context switches per second

in : interrupts per second

r : length of the run queue (processes ready or running)

b : processes in uninterruptible sleep

To inspect per‑process switches, use pidstat -w:

pidstat -w 5

Analyzing High CPU Usage

CPU usage is the percentage of time the CPU spends on non‑idle work. It can be measured with top, ps, or perf. For detailed function‑level analysis, run: perf top -g -p <PID> Identify hot functions (e.g., sqrt, add_function) and remove unnecessary code to improve throughput.

When System CPU Is High but No Process Shows High Usage

Investigate the run queue; many processes may be in the Running state without being the top‑consuming ones. Use pstree to trace parent processes of hidden workloads such as short‑lived stress commands.

Uninterruptible and Zombie Processes

Process states:

R – Running/Runnable

D – Uninterruptible (usually I/O wait)

Z – Zombie (exited but not reaped)

S – Interruptible sleep

I – Idle (kernel threads)

T – Stopped/Traced

X – Dead

Large numbers of D or Z states may indicate I/O problems or missing wait() calls.

CPU Performance Metrics

User CPU usage (processes in user space)

System CPU usage (kernel time)

I/O wait

Soft/Hard interrupt rates

Steal/Guest time (virtualized environments)

Average load (ideally equals number of logical CPUs)

Context switches (voluntary vs. involuntary)

Cache hit rate

Performance Tools Overview

Check load with uptime, then use mpstat and pidstat to locate heavy processes.

Use vmstat for context switches and interrupts, pidstat -w for per‑process switches, and pidstat -d for I/O.

For CPU‑bound issues, start with top, then drill down with perf top.

For I/O bottlenecks, examine iostat, dstat, and strace or perf on the offending process.

Memory Fundamentals

Linux provides each process with a virtual address space split into kernel and user regions. The user space consists of five segments: read‑only (code), data, heap, memory‑mapped files, and stack.

Allocation Strategies

brk() for small allocations (<128 KB) by moving the heap top.

mmap() for large allocations (>128 KB) using memory‑mapped files.

Reclaiming Memory

LRU cache eviction

Swapping out anonymous pages

OOM killer for runaway processes

Common commands to view memory usage:

free
top
ps -o pid,vsize,rss,cmd

Buffers and Cache

Buffers cache raw disk blocks; Cache stores file data. Both appear as used memory but can be reclaimed when needed.

Detecting Memory Leaks

Use BCC’s memleak tool to trace allocations that are never freed:

/usr/share/bcc/tools/memleak -a -p $(pidof <process>)

Swap Behavior

When physical memory is scarce, Linux swaps out anonymous pages. The aggressiveness is controlled by /proc/sys/vm/swappiness (0–100). Even with swappiness set to 0, swap may occur if free memory plus reclaimable cache falls below thresholds.

Analyzing High Swap

Start with free to confirm swap usage, then monitor with sar -r -S or cachetop. Identify processes causing heavy I/O or memory pressure, and consider adjusting swappiness or disabling swap on production nodes.

Quick Memory Diagnosis Workflow

Run free and top for a high‑level view.

Use vmstat and pidstat to spot trends.

Drill down with memleak, perf, or strace for detailed analysis.

Optimization Recommendations

Prefer disabling swap; if unavoidable, lower swappiness.

Reduce dynamic allocations via memory pools or HugePages.

Leverage caches and buffers, or external caches like Redis.

Apply cgroup limits to prevent a single process from exhausting memory.

Adjust /proc/pid/oom_adj for critical services to avoid OOM kills.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

optimization Linux CPU Memory

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.