20 Essential Linux & Kubernetes Troubleshooting Commands Every DevOps Engineer Should Know
This guide compiles the 20 most common Linux and Kubernetes troubleshooting commands, illustrating typical outputs and step‑by‑step diagnostic reasoning for high CPU load, disk pressure, network failures, pod crashes, node issues, service outages, database errors, and application performance problems.
Linux Troubleshooting Commands
1.1 System load too high
top - 15:45:21 up 10 days, 4:35, 1 user, load average: 5.73, 4.65, 3.84
Tasks: 195 total, 1 running, 194 sleeping, 0 stopped, 0 zombie
%Cpu(s): 28.2 us, 3.3 sy, 0.0 ni, 68.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 8192.0 total, 2050.4 free, 6123.8 used, 1018.0 buff/cache
MiB Swap: 2048.0 total, 1500.0 free, 548.0 used. 2075.2 avail MemIdentify processes consuming excessive CPU or memory with top. Terminate offending processes using kill or investigate memory leaks.
1.2 Disk space shortage
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 45G 3.0G 94% /
tmpfs 16G 1.1G 15G 7% /dev/shm
/dev/sdb1 20G 18G 1.2G 94% /mnt/data du -sh /var/log/*
1.3G /var/log/syslog
500M /var/log/auth.log
200M /var/log/kern.logUse df -h to view filesystem usage and du -sh to locate large files. Delete unnecessary files or expand storage.
1.3 Network connectivity issues
ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=55 time=13.4 ms
... traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1 192.168.1.1 (192.168.1.1) 0.435 ms 0.425 ms 0.400 ms
2 * * *
3 10.3.1.1 (10.3.1.1) 5.089 ms 5.065 ms 5.063 ms
4 8.8.8.8 (8.8.8.8) 12.539 ms 12.533 ms 12.510 msIf ping fails, verify local network configuration. Use traceroute to detect routing problems or firewalls.
Kubernetes Troubleshooting Commands
2.1 Pod fails to start (CrashLoopBackOff)
kubectl describe pod pod-name
Name: my-pod
Namespace: default
Node: node-1/192.168.1.100
Containers:
my-container:
Container ID: docker://d4f2e3a6b8db
Image: my-app:v1.0
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error kubectl logs pod-name
Error: java.lang.ExceptionInInitializerError: Unable to initialize the application.Examine container logs with kubectl logs and fix configuration, environment variables, or missing dependencies that cause the crash.
2.2 Node reports NotReady
kubectl get nodes
NAME STATUS ROLES AGE VERSION
node-1 NotReady none 10d v1.23.4
node-2 Ready none 12d v1.23.4 kubectl describe node node-1
Name: node-1
Conditions:
Type Status LastHeartbeatTime Reason Message
Ready False Mon, 27 Nov 2024 11:15:00 -0500 KubeletNotReady Kubelet stopped posting node status.
OutOfDisk True Mon, 27 Nov 2024 11:10:00 -0500 NodeHasNoDiskPressure Node has no disk pressure journalctl -u kubelet
Nov 27 11:10:00 node-1 kubelet[1375]: node "node-1" has disk pressure, evicting podsNotReady is often caused by resource pressure (disk, memory). Use kubectl describe node and journalctl -u kubelet to pinpoint the issue.
2.3 Service unreachable
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-service ClusterIP 10.96.0.1 none 80/TCP 12d kubectl describe svc my-service
Name: my-service
Namespace: default
Labels: app=my-app
Selector: app=my-app
Type: ClusterIP
IP: 10.96.0.1
Port: 80/TCP
Endpoints: 10.1.1.2:80,10.1.1.3:80Verify the service type, port, and that the listed endpoints correspond to healthy pods.
Database Troubleshooting Commands
3.1 Connection failure
mysql -u root -p
Enter password:
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)Check credentials, user privileges, and ensure the MySQL server is running and reachable.
3.2 Table corruption
mysqlcheck -u root -p --auto-repair --check --optimize
+------------+-------+---------+----------+---------+
| Table | Op | Msg_type| Msg_text | Errors |
+------------+-------+---------+----------+---------+
| mydb.table | check | Warning | Found row with wrong checksum |
...Run mysqlcheck to detect and repair damaged tables; maintain regular backups.
3.3 Performance bottlenecks
SHOW PROCESSLIST;
Id User Host db Command Time State Info
1234 app localhost mydb Query 10 Sending data SELECT * FROM large_table WHERE ...
1235 app localhost mydb Query 20 Sorting SELECT * FROM another_table ORDER BY ...Identify long‑running queries, analyze them with EXPLAIN, and monitor CPU, memory, and I/O usage.
3.4 High load
SHOW GLOBAL STATUS LIKE 'Threads_running';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| Threads_running | 250 |
+-----------------+-------+ SHOW ENGINE INNODB STATUS\G
------------------------
LATEST DETECTED DEADLOCK
------------------------
*** (1) TRANSACTION:
TRANSACTION 12345, ACTIVE 10 sec fetching rows
...Reduce concurrent threads, resolve deadlocks by adjusting isolation levels, and optimize schema and indexes.
Application Troubleshooting Commands
4.1 Application crash (Java OOM)
journalctl -u myapp.service
Nov 27 12:00:00 myserver myapp[1234]: Error: OutOfMemoryError: Java heap space
Nov 27 12:01:00 myserver myapp[1234]: Service stopped unexpectedly.Increase JVM heap limits (e.g., -Xmx, -Xms) and use profiling tools such as VisualVM or MAT to detect memory leaks.
4.2 Slow startup
strace -p PID
open("/var/lib/myapp/config.yaml", O_RDONLY) = -1 ENOENT (No such file or directory)Check that required configuration files exist, have correct permissions, and streamline the startup script.
4.3 Performance degradation
top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1234 myuser 20 0 1.2g 500m 30m R 95.0 6.3 10:45.43 javaProfile CPU and memory usage, use APM tools (e.g., JProfiler, New Relic), and verify downstream services are not bottlenecks.
4.4 Log overflow
du -sh /var/log/myapp.log
20G /var/log/myapp.logEnable log rotation (e.g., logrotate), lower log verbosity, and periodically purge old logs to free disk space.
Full-Stack DevOps & Kubernetes
Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
