Operations 16 min read

Complete Linux Server Performance Tuning Guide: From CPU to Filesystem with Real‑World Cases

This guide walks through diagnosing and tuning CPU, memory, network, disk I/O, and filesystem on Linux servers, showing how to use tools such as mpstat, pidstat, vmstat, ss, iostat, iotop and df, and provides concrete commands, parameter recommendations, and real‑world case studies.

AI Agent Super App

May 1, 2026

Complete Linux Server Performance Tuning Guide: From CPU to Filesystem with Real‑World Cases

CPU Optimization: Identify the Runaway Core

CPU bottlenecks are either a single process saturating a core or excessive context switches that reduce useful compute time. Start with data collection before terminating processes.

Per‑core view mpstat -P ALL 1 3 The command prints each CPU’s %usr, %sys, %iowait and %idle every second for three intervals. In the example, CPU 0 shows 58 % user and 20 % idle while the other three CPUs stay around 45 % idle, indicating a single‑threaded workload maxing out CPU 0.

Process‑level view pidstat -u 1 2 pidstat lists CPU usage per process. The screenshot shows nginx at 33.5 %, mysqld at 18 % and java at 12 %.

Context‑switch impact vmstat 1 5 Observe the cs column. Normal operation yields a few thousand switches per second; values in the tens or hundreds of thousands indicate rapid thread creation/destruction or severe lock contention. A real case: cs spiked to 200 k because a logging framework created a new thread per log entry; switching to a thread pool reduced cs to 3 k and cut response time from 200 ms to 15 ms.

Typical tuning actions (illustrative, not prescriptive): set CPU affinity with taskset, configure nginx worker_processes to auto, keep database connection pools moderate, disable unnecessary IRQ migration via echo 0 > /proc/irq/N/smp_affinity, and consider the performance CPU governor for high‑I/O workloads.

Memory Optimization: Available Memory Matters

Linux treats idle RAM as waste and uses it for buffers and cache. The key metric is the available column, not used.

free -h

The example shows 15 GB total, 8.2 GB used, 3.1 GB free, but 6.5 GB available. The 4 GB in buff/cache is reclaimable, so dropping caches with echo 3 > /proc/sys/vm/drop_caches only spikes disk I/O without performance gain.

Swap usage signals past memory pressure. The screenshot shows 1.2 GB swap in use, which incurs a steep latency penalty because disk access is orders of magnitude slower than RAM.

Monitor swap activity with vmstat columns si (swap‑in) and so (swap‑out). Persistent non‑zero values indicate frequent swapping and insufficient memory.

Typical adjustments: set vm.swappiness to 10–20 (default 60), configure Java -Xmx / -Xms to avoid exhausting RAM, set MySQL innodb_buffer_pool_size to 60–70 % of physical memory, use cgroups to cap container memory, and enable Transparent Huge Pages (THP) cautiously.

Network Optimization: Connection Count vs. Quality

Common symptoms include connection timeouts, slow responses, and high TCP retransmission rates. Begin with a global view then drill down.

ss -s

The output reports 1 523 TCP connections, 892 established, 128 in TIME_WAIT, and 128 in CLOSE_WAIT. Excessive TIME_WAIT indicates many short‑lived connections; each occupies a local port in the default range 32768‑60999. When the count reaches tens of thousands, new connections may fail due to port exhaustion. sysctl -w net.ipv4.tcp_tw_reuse=1 Enabling tcp_tw_reuse (while keeping tcp_tw_recycle disabled) allows reuse of TIME_WAIT sockets.

A high CLOSE_WAIT count usually means the application has not called close() after the remote side closed the connection.

Production‑tested kernel parameters:

net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 6
net.core.netdev_max_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535

somaxconn

raises the listen queue length (default 128). tcp_max_syn_backlog expands the half‑open queue, improving SYN handling. Reducing tcp_fin_timeout from 60 s to 15 s frees FIN_WAIT2 sockets faster.

Disk I/O Optimization: Identify and Relieve Bottlenecks

Disk I/O latency dominates overall system speed. Mechanical HDD random latency ≈10 ms, SSD ≈0.1 ms.

iostat -dx 1 3

Key columns: %util – device utilization; >80 % signals saturation (example: nvme0n1 at 78.5 %). r_await / w_await – average read/write wait time (ms). sda w_await 8.35 ms is high for HDD; nvme0n1 w_await 1.25 ms is typical for SSD. aqu-sz – average queue length; >1 indicates queued requests (nvme0n1 at 1.05). iotop -b -n1 -o The snapshot shows nginx reading/writing ~10 MiB/s and mysqld writing 15.6 MiB/s. For heavy logging, consider asynchronous logging, lower log level, or moving logs to a dedicated disk.

Typical actions: separate logs and data onto different disks, enable O_DIRECT for databases, select an I/O scheduler ( mq-deadline or none for SSD, bfq or kyber for HDD), raise vm.dirty_ratio and vm.dirty_background_ratio to batch writes, and for write‑intensive workloads consider RAID 10 or NVMe SSD replacement.

Filesystem Optimization: Choose the Right Format

Filesystem type influences baseline performance. ext4 is stable but average; XFS excels with large files and high‑concurrency writes; btrfs/ZFS add snapshots/compression at a performance cost.

df -Th

The example shows /var/log at 77 % usage; a full log partition can block services and even prevent boot.

ext4 reserves 5 % of space for root. On a 500 GB partition, that’s 25 GB. Reduce the reserve when the partition stores only data: tune2fs -m 1 /dev/sdb1 XFS benefits from the noatime mount option, which stops updating access time on reads, eliminating unnecessary metadata writes: mount -o remount,noatime /data Inode exhaustion can occur even with free space. Each file, directory or symlink consumes an inode. Small‑file‑heavy workloads (cache directories, session files) are prone to this. Check inode usage with: df -i If IUse% approaches 100 %, clean up small files. Inode density can be increased at format time with the -i option, but cannot be changed later.

Typical recommendations: use XFS for databases and log partitions, add noatime to all mount options, run e2fsck (ext4) or xfs_scrub (XFS) regularly, consider tmpfs for small‑file‑intensive temporary data, and monitor inode usage with df -i.

Summary

Performance tuning proceeds from measurement to analysis to targeted adjustment. Use mpstat and pidstat for CPU, free and vmstat for memory, ss (or netstat) for network, iostat and iotop for disk, and df plus tune2fs for filesystems. Correlate data across tools to pinpoint the true bottleneck before changing any parameter.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Network performance-tuning CPU Memory Disk I/O filesystem

Written by

AI Agent Super App

AI agent applications, installation, large-model testing, computer fundamentals, IT operations and maintenance exchange, network technology exchange, Linux learning

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.