Operations 20 min read

Master Linux Production Performance: A Complete Hands‑On Optimization Guide

This comprehensive guide walks you through building a monitoring baseline, tuning CPU, memory, disk I/O, and network settings, applying kernel and application optimizations, and setting up automated monitoring and alerts to dramatically improve Linux production system performance.

Ops Community
Ops Community
Ops Community
Master Linux Production Performance: A Complete Hands‑On Optimization Guide

Master Linux Production Performance: A Complete Hands‑On Optimization Guide

Guide: As a veteran operations engineer, I share a full set of Linux performance‑tuning experiences to fundamentally improve system performance and avoid incidents.

Why performance tuning matters

In high‑concurrency internet services, a 1 second delay can reduce conversion by 7 %. Tuning kernel parameters can cut response time from 800 ms to 200 ms, boosting business by 30 %.

Step 1: Build a performance‑monitoring baseline

Core metric dimensions

Before tuning, establish a complete monitoring system – the “five‑dimensional monitoring model”.

CPU dimension

# Real‑time CPU usage and load
top -p $(pgrep -d ',' your_process)
htop -p $(pgrep your_process)

# CPU details
lscpu
cat /proc/cpuinfo | grep "processor\|model name"

# Context‑switch monitoring
vmstat 1 5
sar -w 1 5

Memory dimension

# Memory usage details
free -h
cat /proc/meminfo
ps aux --sort=-%mem | head -10

# Detect memory leaks
valgrind --tool=memcheck --leak-check=full ./your_program

Disk I/O dimension

# Disk I/O performance
iostat -x 1 5
iotop -o
lsblk -f

# Disk space monitoring
df -h
du -sh /* | sort -hr

Network dimension

# Network traffic monitoring
iftop -i eth0
nethogs
ss -tuln

# Connection status
netstat -ant | awk '{print $6}' | sort | uniq -c

Process dimension

# Process resource consumption
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head
pstree -p
lsof -p PID

Baseline collection script

#!/bin/bash
# performance_baseline.sh – collect system baseline
echo "=== System Performance Baseline $(date) ==="
# (script continues with CPU, memory, disk, network, and load info collection)

Step 2: CPU performance tuning

CPU affinity

# View CPU topology
lscpu -e
numactl --hardware

# Bind process to specific cores
taskset -cp 0-3 PID
numactl --cpubind=0 --membind=0 your_program

# Interrupt balancing
echo 2 > /proc/irq/24/smp_affinity
systemctl enable irqbalance

Process scheduling

# Adjust process priority
nice -n -10 your_critical_process
renice -10 PID

# Real‑time scheduling
chrt -f -p 99 PID
chrt -r -p 50 PID

CPU frequency management

# Set performance governor
echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
cpupower frequency-set -g performance

Step 3: Memory tuning

Memory parameter tuning

# Reduce swap usage
echo 1 > /proc/sys/vm/swappiness

# Adjust dirty page ratios
echo 1 > /proc/sys/vm/dirty_background_ratio
echo 10 > /proc/sys/vm/dirty_ratio
echo 60 > /proc/sys/vm/dirty_expire_centisecs

# Drop caches
echo 1 > /proc/sys/vm/drop_caches

# Transparent hugepage
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Memory leak detection script

#!/bin/bash
# memory_leak_monitor.sh
# (script monitors a process and sends alerts when memory usage exceeds a threshold)

Huge page configuration

# Allocate huge pages
echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

# Mount hugepage filesystem
mount -t hugetlbfs nodev /mnt/hugepages
echo 'nodev /mnt/hugepages hugetlbfs defaults 0 0' >> /etc/fstab

Step 4: Disk I/O optimization

Filesystem tuning

# ext4 mount options
mount -o remount,noatime,nodiratime,data=writeback /dev/sda1 /

# XFS mount options
mount -o remount,noatime,nodiratime,logbufs=8,logbsize=256k /dev/sdb1 /data

I/O scheduler

# View current scheduler
cat /sys/block/sda/queue/scheduler

# Set scheduler
# SSD – deadline or noop
echo deadline > /sys/block/sda/queue/scheduler
# HDD – cfq
echo cfq > /sys/block/sdb/queue/scheduler

Disk performance testing

# Sequential read/write test
dd if=/dev/zero of=/tmp/testfile bs=1G count=1 oflag=direct
dd if=/tmp/testfile of=/dev/null bs=1G count=1 iflag=direct

# Random I/O test (fio required)
fio -filename=/tmp/fio_test -direct=1 -iodepth=64 -thread -rw=randrw -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=60 -group_reporting -name=randrw_test

Step 5: Network performance tuning

TCP/IP parameters

# Append to /etc/sysctl.conf
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.netfilter.nf_conntrack_max = 1048576
net.ipv4.tcp_fin_timeout = 30
net.core.netdev_max_backlog = 5000
net.core.somaxconn = 65535
# (additional parameters omitted for brevity)
sysctl -p

NIC multi‑queue configuration

# View NIC queues
ethtool -l eth0

# Set combined queues
ethtool -L eth0 combined 4

# IRQ affinity
echo 1 > /proc/irq/24/smp_affinity
echo 2 > /proc/irq/25/smp_affinity

Network monitoring script

#!/bin/bash
# network_monitor.sh
INTERFACE="eth0"
LOG_FILE="/var/log/network_performance.log"
while true; do
    RX_BYTES=$(cat /sys/class/net/$INTERFACE/statistics/rx_bytes)
    TX_BYTES=$(cat /sys/class/net/$INTERFACE/statistics/tx_bytes)
    RX_PACKETS=$(cat /sys/class/net/$INTERFACE/statistics/rx_packets)
    TX_PACKETS=$(cat /sys/class/net/$INTERFACE/statistics/tx_packets)
    echo "$(date): RX: $RX_BYTES bytes, $RX_PACKETS packets | TX: $TX_BYTES bytes, $TX_PACKETS packets" >> $LOG_FILE
    sleep 10
done

Step 6: Kernel parameter tuning

System‑wide sysctl settings

# /etc/sysctl.conf example
kernel.pid_max = 4194304
kernel.threads-max = 4194304
fs.file-max = 6553600
vm.swappiness = 1
vm.dirty_background_ratio = 1
vm.dirty_ratio = 10
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_max_syn_backlog = 65536
# (additional parameters omitted)
sysctl -p

File‑descriptor limits

# /etc/security/limits.conf
* soft nofile 1048576
* hard nofile 1048576
root soft nofile 1048576
root hard nofile 1048576

# systemd limits
[Manager]
DefaultLimitNOFILE=1048576
DefaultLimitNPROC=1048576

Step 7: Application‑layer tuning

Database optimization (MySQL/MariaDB)

# /etc/mysql/mysql.conf.d/mysqld.cnf
[mysqld]
innodb_buffer_pool_size = 8G
innodb_log_file_size = 512M
query_cache_type = 1
query_cache_size = 256M
max_connections = 1000
slow_query_log = 1
# (additional settings omitted)

Web server (Nginx) tuning

# /etc/nginx/nginx.conf
worker_processes auto;
worker_cpu_affinity auto;
worker_rlimit_nofile 1048576;

events { worker_connections 65535; use epoll; multi_accept on; }

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    gzip on;
    open_file_cache max=200000 inactive=20s;
    client_body_buffer_size 128k;
    client_max_body_size 50m;
    include /etc/nginx/conf.d/*.conf;
}

Step 8: Troubleshooting toolbox

System analysis tools

# Install common tools
yum install -y sysstat iotop htop glances perf strace tcpdump wireshark-cli
apt-get install -y sysstat iotop htop glances linux-tools-generic strace tcpdump tshark

One‑click health‑check script

#!/bin/bash
# system_health_check.sh
# (script prints CPU, memory, disk, network, process, and service status and raises alerts when thresholds are exceeded)

Performance bottleneck finder

#!/bin/bash
# performance_bottleneck_finder.sh
# (script checks CPU idle, memory usage, and I/O wait and suggests actions)

Step 9: Automated monitoring and alerting

Prometheus + Grafana configuration

# prometheus.yml (excerpt)
global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']
  - job_name: 'mysql-exporter'
    static_configs:
      - targets: ['localhost:9104']

Alert rules

# alert_rules.yml (excerpt)
groups:
  - name: system_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage detected"
          description: "CPU usage is above 80% for more than 5 minutes"
      - alert: HighMemoryUsage
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage detected"
          description: "Memory usage is above 90% for more than 5 minutes"
      - alert: DiskSpaceLow
        expr: (node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100 > 85
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Disk space low"
          description: "Disk usage is above 85%"

Step 10: Best‑practice checklist

System level: keep load < CPU cores, memory usage < 80 %, swap < 5 %, I/O wait < 10 %.

Application level: proper DB connection pool, cache hit rate > 90 %, enable static‑resource compression.

Network level: TCP connections within limits, latency reasonable, bandwidth monitored.

Core principles

Monitoring first : No monitoring, no optimization.

Baseline establishment : Define before‑after metrics.

Iterative tuning : Avoid massive one‑shot changes.

Test verification : Validate each change thoroughly.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationLinuxsystem-monitoringServer AdministrationLinux Tuning
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.