Operations 19 min read

Mastering Server Performance: CPU, Memory, and I/O Optimization Techniques

This guide explains how to assess and improve server performance by understanding key metrics such as CPU utilization, memory management, and disk I/O, introducing fundamental concepts, analysis methodologies like 5W2H and load‑characteristic summarization, and recommending practical Linux tools such as uptime, vmstat, top, perf, and DTrace for comprehensive troubleshooting.

21CTO

Aug 16, 2017

Mastering Server Performance: CPU, Memory, and I/O Optimization Techniques

21CTO Community Introduction Beyond keeping platforms stable, server performance optimization is essential; the DBA team at Tencent Interactive provides comprehensive material to guide performance tuning.

Overview

What Is Performance?

The simplest metric for performance is time. CPU usage represents the proportion of time spent computing, while disk usage reflects time spent on I/O. When CPU or disk usage reaches 100%, requests may be delayed or time out. Effective performance analysis starts by identifying what slows response time, typically focusing on CPU and I/O because applications are either CPU‑bound or I/O‑bound. Memory issues often manifest as CPU or I/O bottlenecks, and severe memory problems can affect functionality.

Performance optimization is not isolated; it must also consider functionality, security, and other aspects.

Foundations of Performance Analysis

Solid knowledge of the following is required:

Operating System – manages resources such as CPU and I/O, including file system type, disk type, RAID configuration.

System Programming Techniques – how to use system resources (buffered I/O, Direct I/O, synchronous vs. asynchronous, multi‑process, multi‑thread).

Application Layer – database engine, indexes, replication, configuration parameters, backup, high‑availability, etc.

Performance Analysis Methodology

Common frameworks include Pyramid Thinking, 5W2H, and McKinsey’s seven‑step method. Applying 5W2H yields:

What – describe the phenomenon.

When – when it occurs.

Why – root cause.

Where – location of the issue.

How much – resource consumption and potential savings.

How to do – solution approach.

Beyond direction, concrete methods are needed. Brendan Gregg’s book highlights techniques such as Use‑method, load‑characteristic summarization, performance monitoring, static tuning, latency analysis, and tool‑based methods. Load‑characteristic summarization is especially useful, while static and dynamic tracing help reveal problems.

CPU

Understanding the CPU

Key concepts include processor, core, hardware thread, CPU cache, clock frequency, CPI, IPC, instruction set, utilization, user vs. kernel time, scheduler, run queue, preemption, multi‑process, multi‑thread, and word length.

For applications, focus on kernel CPU scheduler behavior and performance.

Thread State Analysis

Threads spend time either on‑CPU (user time and system time) or off‑CPU (waiting for I/O, locks, paging, etc.). Distinguishing these states helps pinpoint bottlenecks.

Analysis Methods and Tools

When examining CPU load, consider the following checklist:

Overall system CPU load and per‑CPU utilization.

Degree of concurrency – single‑threaded or multi‑threaded?

Which processes are consuming CPU?

Which kernel threads are consuming CPU?

CPU usage by interrupts.

User‑space vs. kernel‑space call paths.

Types of stall cycles encountered.

Useful Linux tools include:

uptime – average load.

vmstat – system‑wide CPU load.

top – per‑process/thread CPU usage.

pidstat – detailed per‑process/thread CPU breakdown.

ps – process state.

perf – CPU profiling, performance counters, and tracing.

Perf can trace kernel events such as cache misses and page faults, while DTrace offers flexible tracing for deeper analysis.

Practical Example

Flame graphs reveal CPU call paths. During MySQL load testing, perf top showed function call frequencies, but flame graphs visualized the call hierarchy, helping identify hotspots.

Memory

Understanding Memory

Key concepts include physical memory, virtual memory, resident set, address space, OOM, page cache, page faults, swapping, allocation libraries (libc, glibc, libmalloc, mtmalloc), and the Linux SLUB allocator.

Analysis Methods and Tools

Important questions:

System‑wide physical and virtual memory usage.

Swap, OOM, and paging activity.

Kernel and filesystem cache utilization.

Where each process’s memory is allocated.

Why a process or kernel allocated memory.

Which processes are constantly swapping.

Presence of memory leaks.

Common tools:

free – cache statistics.

vmstat – virtual memory stats.

top – per‑process memory usage.

ps – process state.

DTrace – allocation tracing.

While DTrace can trace allocations, most tools only provide aggregate statistics; deeper investigation may require custom tracing scripts.

Practical Example

A memory‑leak case involved Lua scripts that allocated memory quickly; the driver’s periodic service reclaimed memory in large batches, causing CPU pressure. The solution was incremental reclamation combined with periodic full reclamation.

I/O

Logical I/O vs. Physical I/O

Disk I/O metrics (e.g., iostat’s avgqu‑sz, svctm) reflect physical I/O, but most applications interact with the filesystem layer, not raw disks. The Linux VFS abstracts devices as files; read() invokes vfs_read, which checks page cache before issuing block‑device requests. Understanding pre‑read, caching, and write‑back behavior is crucial for performance.

Write paths exhibit similar complexities, with factors such as other applications, users, kernel tasks, filesystem pre‑read, write buffering, metadata overhead, alignment, compression, and caching influencing observed I/O.

Filesystem Analysis and Tools

Key terms: filesystem, VFS, filesystem cache, page cache, buffer cache, directory cache, inode, inode cache.

Filesystem cache structure combines page cache and buffer cache; dirty pages are written back by kernel threads such as kswapd. Filesystem latency includes time spent in the filesystem, kernel I/O subsystem, and waiting for the disk.

Linux lacks direct tools for measuring filesystem latency, but disk metrics can be correlated.

Typical analysis questions:

Which application uses the filesystem?

Which files are accessed?

Read/write ratio and sync vs. async mode?

Cache size and utilization?

Any errors or illegal requests?

Disk Analysis and Tools

Important concepts: virtual disk, sector, I/O request, command, bandwidth, throughput, latency, service time, wait time, random vs. sequential I/O, sync vs. async, interface, RAID.

Key analysis questions include per‑disk utilization, queue length, service/wait times, responsible applications, I/O patterns, and kernel call paths.

Useful Linux tools:

iostat – per‑disk statistics.

iotop / pidstat – per‑process disk I/O.

perf / DTrace – tracing.

During MySQL non‑cache, non‑in‑place update testing, kernel block‑device events (blk_finish_plug vs. blk_queue_bio) revealed differing I/O characteristics between single‑instance and multi‑instance workloads.

References:

http://www.brendangregg.com

Brendan Gregg, "Systems Performance: Enterprise and the Cloud"

Robert Love, "Linux Kernel Development"

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

System Optimization I/O CPU memory

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.