Diagnosing Linux CPU Spikes with top, Thread Dumps, and jstack
This guide walks through real‑world Linux performance troubleshooting, showing how to use top to pinpoint high‑CPU processes, convert thread IDs, capture multiple jstack thread dumps, and interpret key top metrics such as load average, task states, and memory usage.
Background
When a service that has been running smoothly suddenly shows a CPU spike, the first step is to identify the offending process. Using top reveals which PID is consuming most CPU, allowing deeper inspection of its threads.
Investigating with top
Run top -Hp <PID> to list threads of the high‑CPU process. In the example, PID 2816 showed high usage, and thread 2825 was the culprit.
To correlate thread IDs with Java thread dumps, convert the decimal thread ID to hexadecimal (e.g., using Python’s hex() function).
Thread Dump Analysis
Capture several jstack dumps for the same PID because thread states can change rapidly. The dumps reveal threads holding locks and those waiting, helping pinpoint why a lock is not released.
Deep Dive into top
The top command provides a wealth of information:
First line : system time vs. uptime; focus on uptime because frequent reboots can mask issues.
Number of logged‑in users (check with who or last).
Load averages (1‑, 5‑, 15‑minute) – compare against CPU core count to assess load severity.
Second line : total tasks and number of zombie processes – watch the zombie count.
Third line : CPU usage breakdown.
Key CPU columns:
US/SY : user vs. system CPU time.
NI : nice‑adjusted processes (should be low).
ID : idle CPU; WA indicates I/O wait, which spikes under heavy logging.
HI/SI : hardware vs. software interrupts.
ST : stolen time for virtualized environments.
Memory and Cache Details
Top’s fourth and fifth rows show buffer (data awaiting processing) and cache (cached results, e.g., from a database). Excessive swap usage indicates insufficient RAM.
Process list columns explained: PID, USER, PR, VIRT, RES, SHR. Note that RES is the actual resident memory; the true physical memory used by a process is RES‑SHR.
Conclusion
By combining top for real‑time metrics, converting thread IDs, and analyzing multiple jstack dumps, engineers can quickly isolate the root cause of CPU spikes, such as lock contention or runaway threads, and take corrective actions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
