Mastering Server Performance: A Practical Guide to CPU, Memory, and I/O Optimization
This article provides a comprehensive guide to server performance optimization, covering the fundamentals of CPU, memory, and I/O analysis, practical methodologies, essential tools, and real‑world case studies to help operations engineers identify bottlenecks and improve system stability.
Introduction: In operations work, besides maintaining platform stability, you must also optimize server performance; ensuring good performance is the foundation for stable operation. Tencent Interactive DBA team member Wang Wei (Simon) compiled a set of performance‑optimization materials to provide ample direction for performance improvement.
Overview
What Is Performance?
The most intuitive metric for performance is "time"; CPU utilization represents the proportion of time the CPU spends computing, and disk utilization represents the proportion of time spent on disk operations.
When CPU utilization reaches 100%, some requests cannot be processed in time, leading to increased response latency or timeouts.
When disk utilization reaches 100%, some requests must wait for I/O, also increasing latency or causing timeouts.
In other words, if all operations complete within ideal time, there is no performance‑optimization problem. Performance analysis starts by identifying what causes response time slowdown, typically focusing on CPU and I/O because applications are usually CPU‑bound or I/O‑bound.
CPU‑bound means compute‑intensive; I/O‑bound means read/write‑intensive. Memory issues often manifest as CPU or I/O bottlenecks because memory is designed to improve kernel instruction and application read/write performance.
Insufficient memory can trigger heavy swapping, making the disk a bottleneck; page faults, memory allocation, release, copying, and address‑space mapping can cause CPU bottlenecks. Severe memory problems may affect functionality, which goes beyond pure performance.
Performance optimization is not isolated; besides response time, you must also consider functional completeness, security, and other aspects.
Fundamentals of Performance Analysis
Effective performance optimization requires solid foundational knowledge:
Operating System – Manages all resources needed by applications, such as CPU and I/O. Issues in file system type, disk type, RAID configuration, etc., are all OS‑managed.
System Programming Techniques – Determines how to use system resources, e.g., buffered I/O vs. direct I/O, synchronous vs. asynchronous, multi‑process vs. multi‑thread.
Application Layer – Database component types, engines, indexes, replication, configuration parameters, backup, high‑availability, etc., can all be sources of performance problems.
Performance Analysis Methodology
Problem‑analysis frameworks such as Pyramid Thinking, 5W2H, and McKinsey’s Seven‑Step method provide direction. Applying 5W2H yields questions like:
What – What is the observed phenomenon?
When – When does it occur?
Why – Why does it happen?
Where – Where does it happen?
How much – How many resources are consumed, and how much can be saved after fixing?
How to do – How to solve it?
Beyond these high‑level guides, Brendan Gregg’s "The Performance Handbook" (Chapter 2) introduces concrete methods such as Use‑case analysis, load‑characteristic summarization, performance monitoring, static tuning, latency analysis, and tool‑based approaches.
CPU
Understanding the CPU
Key concepts include processor, core, hardware thread, CPU cache, clock frequency, CPI/IPC, instruction set, utilization, user vs. kernel time, scheduler, run queue, preemption, multi‑process, multi‑thread, and word size.
For applications, we usually focus on kernel CPU scheduler behavior and performance.
Thread‑state analysis distinguishes:
on‑CPU : Executing (user time + system time). off‑CPU : Waiting for the next CPU slice, I/O, locks, paging, etc., with sub‑states such as runnable, anonymous paging, sleep, lock, idle.
If a large portion of time is on‑CPU, CPU profiling quickly reveals the cause; if most time is off‑CPU, diagnosis becomes more time‑consuming.
Analysis Methods and Tools
When observing CPU performance, use load‑characteristic summarization to check:
Overall system CPU load and per‑CPU utilization.
Concurrency of CPU load (single‑threaded or multi‑threaded, thread count).
Which application and how much CPU it consumes.
Which kernel thread consumes CPU.
Interrupt CPU usage.
User‑space vs. kernel‑space call paths.
Types of stall cycles encountered.
Answering these questions is most economical with system performance tools:
Tool
Description
uptime
Average load
vmstat
System‑wide CPU average load
top
Monitor per‑process/thread CPU usage
pidstat
Breakdown of CPU usage per process/thread
ps
Process state
perf
CPU profiling, performance counters
For call‑path and stall‑cycle analysis,
perfor DTrace can be used.
Practical Case
Flame graphs help visualize CPU call paths. In a MySQL non‑in‑place update benchmark,
perf topshowed function call frequencies, while flame graphs revealed the hierarchical call relationships.
Memory
Understanding Memory
Key memory concepts include physical memory, virtual memory, resident set, address space, OOM, page cache, page faults, swapping, swap space, allocators (libc, glibc, libmalloc, mtmalloc), and the Linux SLUB allocator.
Analysis Methods and Tools
Brendan’s book suggests examining memory‑bus balance, NUMA node allocation, etc., but practical analysis follows a checklist:
System‑wide physical and virtual memory usage.
Swapping, OOM events.
Kernel and filesystem cache usage.
Per‑process memory distribution.
Reasons for process memory allocation.
Reasons for kernel memory allocation.
Processes that continuously swap.
Potential memory leaks.
Typical tools:
Tool
Description
free
Cache size statistics
vmstat
Virtual memory statistics
top
Monitor per‑process memory usage
ps
Process state
DTrace
Allocation tracing
Only allocation tracing (e.g., DTrace) can pinpoint memory leaks; other tools provide statistical views.
Practical Case
A memory‑leak investigation revealed that a Lua script allocated memory quickly; the driver’s periodic service reclaimed memory in bulk, causing occasional CPU pressure. The solution was staged reclamation: reclaim a portion each cycle and perform full reclamation periodically.
I/O
Logical I/O vs. Physical I/O
I/O load usually refers to disk I/O (physical I/O). Metrics from
iostatsuch as avgqu‑sz, svctm, and await describe this.
Most read/write operations go through the filesystem (VFS) rather than raw devices. The kernel checks page cache first; if data is missing, it issues block‑device requests, which the I/O scheduler dispatches to the disk driver.
Sequential reads benefit from prefetching; random reads may cause read amplification.
Write paths have similar amplification or reduction effects due to filesystem buffering, metadata, alignment, compression, etc.
Filesystem Analysis and Tools
Key filesystem concepts: filesystem, VFS, page cache, buffer cache, directory cache, inode, inode cache.
Filesystem cache structures store virtual memory pages, improving file and directory performance. The kernel’s kswapd thread writes back dirty pages when memory is low or after a timeout.
Filesystem latency includes time spent in the filesystem, kernel I/O subsystem, and waiting for the disk device.
Disk Analysis and Tools
Important disk concepts: virtual disk, sector, I/O request, command, bandwidth, throughput, latency, service time, wait time, random vs. sequential I/O, sync vs. async, interface, RAID.
Typical analysis checklist:
Per‑disk utilization.
Queue length per disk.
Average service and wait times.
Which application or user is using the disk.
Read/write patterns (random vs. sequential, sync vs. async).
Kernel call path that initiates I/O.
Read/write ratio.
Common tools:
Tool
Description
iostat
Per‑disk statistics
iotop, pidstat
Disk I/O per process
perf, DTrace
Tracing tools
In a MySQL non‑in‑place update benchmark, tracing block‑device events showed that single‑instance runs had ~30%
blk_finish_plugand 70%
blk_queue_bio, while multi‑instance runs showed the opposite distribution.
References
Brendan Gregg, Systems Performance: Enterprise and Cloud (http://www.brendangregg.com)
Robert Love, Linux Kernel Development
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.