Operations 44 min read

Master Linux Performance: Optimize CPU, Memory, and I/O with Proven Tools

This guide explains Linux performance optimization by defining key metrics such as throughput and latency, clarifying average load, detailing CPU context switches, describing common performance analysis tools, and providing practical methods for diagnosing and improving CPU, memory, and I/O bottlenecks in production environments.

Open Source Linux
Open Source Linux
Open Source Linux
Master Linux Performance: Optimize CPU, Memory, and I/O with Proven Tools

Part 1 Linux Performance Optimization

1 Performance Optimization

Performance Metrics

High concurrency and fast response correspond to two core metrics: throughput and latency .

Performance problems arise when system resources hit a bottleneck while request handling is still too slow to support more requests. Performance analysis aims to locate these bottlenecks and mitigate them.

Application load : Directly impacts end‑user experience.

System resources : Resource utilization and saturation.

Key steps include selecting metrics, setting performance goals, conducting benchmarks, analyzing bottlenecks, and monitoring with alerts.

Understanding "Average Load"

Average load is the average number of runnable and uninterruptible processes over a time interval; it is not directly comparable to CPU utilization.

Uninterruptible processes are in kernel mode (e.g., waiting for I/O). This state protects processes and hardware.

When is Average Load Reasonable?

Monitor average load in production, compare against historical trends, and set thresholds (e.g., 70% of CPU count).

CPU‑intensive workloads raise load and align with CPU usage.

I/O‑intensive workloads raise load without high CPU usage.

Heavy scheduling also raises load and CPU usage.

2 CPU

CPU Context Switch (Upper)

CPU context switch saves the previous task's registers and program counter, loads the new task's context, and jumps to the new task.

Switch types:

Process context switch

Thread context switch

Interrupt context switch

Process Context Switch

Linux separates kernel and user space. A system call performs two context switches: saving user registers, loading kernel registers, and restoring them after the call.

System calls are privileged mode switches, not full process switches.

Thread Context Switch

Two cases: threads within the same process share virtual memory (lightweight), or threads of different processes (same as process switch). Same‑process thread switches consume fewer resources.

Interrupt Context Switch

Only kernel‑mode state is saved; interrupt handling has higher priority than process switches.

CPU Context Switch (Lower)

Use vmstat 5 to view overall context switches:

vmstat 5    # output every 5 seconds
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 103388 145412 511056    0    0    18    60    1    1  2  1 96  0  0
 ...

cs : context switches per second

in : interrupts per second

r : length of the run queue (runnable processes)

b : processes in uninterruptible sleep

Use pidstat -w 5 to see per‑process voluntary and involuntary switches.

pidstat -w 5
14:51:16 UID PID   cswch/s nvcswch/s Command
14:51:21 0   1     0.80    0.00    systemd
 ...

What to Do When an Application Hits 100% CPU?

Linux schedules tasks in short time slices using a tick timer. CPU usage is calculated from /proc/stat differences over an interval.

Tools: top, ps, and perf (top/record/report) to pinpoint hot functions.

perf top -g -p <PID>

When System CPU Is High but No High‑CPU Process Is Visible

Investigate processes in the Running state, check for short‑lived execs, and use pstree to find parent processes (e.g., stress commands launched by php‑fpm).

Uninterruptible and Zombie Processes

Process states:

R – Running/Runnable

D – Uninterruptible (usually I/O)

Z – Zombie (exited but not reaped)

S – Interruptible sleep

I – Idle (kernel threads)

T – Stopped/Traced

X – Dead

Uninterruptible processes may indicate I/O problems; zombies consume PID space.

CPU Performance Indicators

CPU usage (user, system, iowait, soft/hard IRQ, steal/guest)

Average load (ideal ≈ number of logical CPUs)

Context switches (voluntary vs involuntary)

Cache hit rate (L1/L2/L3)

Performance Tools

Average load: uptime, then mpstat and pidstat to locate heavy processes.

Context switches: vmstatpidstat → thread‑level pidstat -t.

High CPU process: topperf top.

High system CPU without a culprit: re‑examine top, focus on Running processes, use perf record or execsnoop.

Uninterruptible/Zombie cases: toppstree → source inspection.

Soft‑IRQ spikes: top, /proc/softirqs, sar, tcpdump.

3 Memory

How Linux Memory Works

Linux provides each process with an isolated virtual address space, split into kernel and user regions. Physical memory is allocated on demand via page tables managed by the MMU.

When a page is not present, a page‑fault occurs, the kernel allocates a physical page, updates the page table, and resumes the process.

Linux uses multi‑level page tables and HugePages to reduce overhead.

Virtual Memory Layout

Read‑only segment (code, constants)

Data segment (globals)

Heap (dynamic allocation, grows upward)

Memory‑mapped region (shared libraries, mmap, grows downward)

Stack (local variables, fixed size, typically 8 MiB)

Allocation and Release

brk() handles small allocations (<128 KB) by moving the program break; freed memory is cached.

mmap() handles large allocations (>128 KB) via direct mapping; memory is returned to the kernel on free, causing page faults on reuse.

How to View Memory Usage

free

: overall system memory. top/ps: per‑process VIRT, RES, SHR, %MEM.

Buffers and Cache

Buffers cache raw disk blocks; cache stores file data. Both improve read/write performance.

Improving Cache Efficiency

Install bcc tools: cachestat, cachetop. Use pcstat to inspect file cache size.

# install Go
export GOPATH=~/go
export PATH=~/go/bin:$PATH
go get golang.org/x/sys/unix
go get github.com/tobert/pcstat/pcstat

Example with dd to generate a 512 MiB file, drop caches, and measure cache hit rate.

O_DIRECT Bypassing Cache

Running a container with O_RDONLY|O_DIRECT shows low cache hit rates and slow reads, confirming direct I/O bypasses the page cache.

Memory Leaks

Leaks occur when heap allocations are not freed or when out‑of‑bounds accesses cause crashes.

Detect leaks with

bcc
memleak

:

/usr/share/bcc/tools/memleak -a -p $(pidof app)

Swap Usage

When memory is tight, Linux swaps out anonymous pages. Swap activity can be observed via free, sar -r -S, and vmstat. The swappiness parameter (0‑100) controls how aggressively swap is used.

Analyzing High Swap

Create swap if missing, then monitor with dd, sar, cachetop, and /proc/zoneinfo to understand pressure on memory zones.

Memory Performance Tools

Use free, top, vmstat, pidstat for a broad view, then drill down with memleak, cachestat, perf, and numactl for NUMA‑aware analysis.

Original source: https://www.ctq6.cn/ (author: mikelLam)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationLinuxMonitoring ToolsCPUMemory
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.