Operations 18 min read

Master Server Monitoring: Diagnose CPU, Memory, Disk & TCP Alerts

This guide explains how to identify and resolve common server monitoring alerts—including CPU, memory, disk space, disk I/O, and TCP connection issues—using Linux commands such as top, df, du, iotop, and netstat, and provides practical remediation steps.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master Server Monitoring: Diagnose CPU, Memory, Disk & TCP Alerts

Server Monitoring Metrics

During routine server inspections, various alerts arise from different servers and monitoring tools (e.g., Zabbix, Prometheus + Grafana). This article focuses on resource‑related alerts and outlines common handling approaches.

CPU Alerts

Use top to view processes, press Shift+P to sort by CPU usage, and examine the %CPU column. Note that the displayed CPU usage is per core; 100% represents full load of a single core (e.g., a 4‑core server can reach 400%).

# top
Tasks: 197 total,   1 running, 196 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.2 us,  1.3 sy,  0.0 ni, 97.3 id,  0.2 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem :  8008984 total,  1046216 free,  4712336 used,  2250432 buff/cache
KiB Swap:  7208956 total,  4409068 free,  2799888 used.  2373196 avail Mem
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
1456 root      20   0   10.5g 361648 242164 S   3.0  4.5 12461:08 clickhouse-server --config-file=/etc/clickhouse-+
1089 root      20   0 5755452 238580  2644 S   1.7  3.0 4330:47 java -jar V2XRealtimeServer.jar
1086 root      20   0 5822324 319628  3028 S   1.3  4.0 4161:58 java -jar V2XRawDataServer.jar
10174 root      20   0 5819584 963512  4420 S   1.3 12.0 3619:07 java -jar V2XWebSocketServer.jar
2105 mysql     20   0 3205688 907124  7584 S   0.7 11.3 1462:50 /usr/sbin/mysqld --daemonize --pid-file=/var/run+
1090 root      20   0 148952  4648   780 S   0.3  0.1 420:01.32 /usr/local/redis/bin/redis-server 0.0.0.0:7379 [+]
17013 root      20   0 162128  2344  1600 R   0.3  0.0 0:00.04 top
1    root      20   0 125516  2636  1492 S   0.0  0.0 133:31.76 /usr/lib/systemd/systemd --switched-root --syste+

Typical scenarios:

Continuous alerts for compute‑intensive applications (e.g., data cleaning, transformation).

Transient alerts below 70% CPU usage that do not affect system responsiveness.

Increasing frequency of alerts, possibly due to bugs or vulnerabilities.

Time‑bound spikes often linked to business traffic peaks.

Common remedies:

Adjust application configuration (thread count, concurrency) to limit resource consumption.

Patch known vulnerabilities or upgrade the component.

Balance traffic via clustering, caching, load balancing, or schedule adjustments.

Scale up CPU resources or migrate services to higher‑performance servers.

Memory Alerts

Run top and press Shift+M to sort by memory usage, focusing on the RES and %MEM columns.

# top
Tasks: 195 total,   1 running, 194 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.3 us,  1.1 sy,  0.0 ni, 97.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8008984 total,   969272 free,  4721960 used,  2317752 buff/cache
KiB Swap: 7208956 total, 4409068 free, 2799888 used. 2363556 avail Mem
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
10174 root      20   0 5819584 963512  4420 S   1.3 12.0 3619:52 java -jar V2XWebSocketServer.jar
10166 root      20   0 5768092 921932  4252 S   0.0 11.5 364:51.16 java -jar V2XStatisticsServer.jar
2105  mysql     20   0 3205688 907124  7584 S   0.0 11.3 1463:03 /usr/sbin/mysqld --daemonize --pid-file=/var/run+
1087  root      20   0 5809328 449920  2736 S   0.0  5.6 226:25.74 java -jar V2XApiServer.jar
1456  root      20   0 10.5g 369520 242164 S   3.0  4.6 12463:01 clickhouse-server --config-file=/etc/clickhouse-+
1086  root      20   0 5822324 319628  3028 S   1.3  4.0 4162:45 java -jar V2XRawDataServer.jar
1064  root      20   0 5702928 286440  2272 S   0.3  3.6 721:06.60 java -jar msbus.jar
1089  root      20   0 5755452 238580  2644 S   1.7  3.0 4331:30 java -jar V2XRealtimeServer.jar
27891 root      20   0 1111052 25192   2324 S   0.0  0.3 4:21.71 /usr/bin/dockerd -H fd:// --containerd=/run/cont+

Typical remedies:

Tune application parameters to limit memory usage, cache size, queue length, etc.

Expand server memory or migrate services to machines with higher memory capacity.

Disk Space Alerts

Use df -h to view partition usage (focus on Use% and Mounted on), then du -sh to locate directories consuming the most space.

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/klas_host--10--169--183--49-root 95G 9.6G 86G 11% /
/dev/vda2            1014M 217M 798M 22% /boot
/dev/vda1             200M 5.8M 195M 3% /boot/efi
/dev/mapper/vgdata-lvdata 100G 56G 45G 56% /data
# du -sh /data/*
4.6M    /data/h5
40M    /data/ioc-guanai
242M   /data/jdk
54G    /data/jnpf
5.2M    /data/redis
952M   /data/soft

Common solutions:

Rotate or delete large log files (e.g., using logrotate).

For data‑disk saturation, limit data retention time, enable compression, or adjust application parameters.

If the system partition is full, move applications or logs to the data disk, or reconfigure Docker storage.

Scale disk capacity by adding or expanding dedicated data disks.

Disk I/O Alerts

Install iotop (via yum on CentOS) and run iotop -o to see processes with the highest I/O. Columns of interest: SWAPIN (swap usage) and IO> (I/O wait).

# iotop -o
Total DISK READ :    0.00 B/s | Total DISK WRITE :     388.00 K/s
Actual DISK READ:    0.00 B/s | Actual DISK WRITE:     633.68 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
518 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.16 % [xfsaild/dm-0]
20271 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/3:2]
2178 be/4 root 0.00 B/s 407.08 B/s 0.00 % 0.00 % java -jar V2XRawDataServer.jar
...

Typical patterns mirror CPU alerts: continuous I/O for storage‑intensive apps, occasional spikes below 70% that are tolerable, increasing frequency indicating bugs, or time‑bound spikes during traffic peaks.

Remediation strategies:

Limit per‑application performance (threads, concurrency, cache settings).

Patch vulnerable components or upgrade versions.

Balance traffic via clustering, caching, load balancing, or scheduling.

Upgrade disk performance (e.g., SSD) or migrate services to faster storage.

TCP Connection Alerts

Run netstat -antp to count connections by state. Focus on ESTABLISHED and TIME_WAIT. Excessive ESTABLISHED connections indicate the need to scale the service; many TIME_WAIT connections can exhaust socket resources under high short‑lived connection loads.

Solutions:

For server‑side overload, deploy multiple nodes, load balancers, or split services.

For client‑side overload, use connection pooling or distribute clients across nodes.

Mitigate TIME_WAIT by enabling TCP keep‑alive, using long‑lived connections, and tuning kernel parameters (e.g., net.ipv4.tcp_tw_reuse=1, net.ipv4.tcp_tw_recycle=1, net.ipv4.tcp_fin_timeout=30).

# vim /etc/sysctl.conf
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_fin_timeout=30
# sysctl -p

Appendix: TCP Statistics Commands

Count connections by state:

# netstat -antp | awk -F '[ /]+' 'NR>2 {count[$6]++} END {for(state in count) print state, "\t\t", count[state] }'
LISTEN          16
CLOSE_WAIT      2
ESTABLISHED     273
FIN_WAIT2       1
TIME_WAIT       1

Count processes per state (e.g., ESTABLISHED):

# netstat -antp | grep -i established | awk -F '[ /]+' '{count[$8]++} END {for(app in count) print app, "\t\t", count[app] }'
java          124
mysqld        109
clickhouse-ser 6
sshd:          1
redis-server   31
server monitoringdisk usageTCP connectionscpu alert
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.