Operations 6 min read

How to Diagnose CPU Spikes on Linux: A Real‑World Top and Thread Dump Walkthrough

This article walks through a practical Linux performance investigation, showing how to use the top command to pinpoint high‑CPU processes, examine thread details, convert thread IDs, analyze thread dumps for lock contention, and interpret key top output fields for effective troubleshooting.

ITPUB

Jun 5, 2018

How to Diagnose CPU Spikes on Linux: A Real‑World Top and Thread Dump Walkthrough

Simulated Online Troubleshooting

When a service suddenly experiences a CPU spike, the first step is to log into the server and run top to identify the offending process. In the example, top reveals that PID 2816 is consuming a large amount of CPU.

To drill down into the threads of that process, the command top -Hp 2816 is used. The output shows that thread 2825 has the highest CPU usage.

Because thread IDs in Java thread dump files are shown in hexadecimal, the decimal thread ID is converted to hex using a short Python snippet (the conversion itself is not shown here). This allows the analyst to match the thread in the dump file.

Multiple thread dumps (e.g., using jstack <pid>) are recommended because thread states change over time. By comparing dumps, you can see one thread holding a lock and another waiting for it, indicating a lock‑contention problem that should be investigated in the application code.

Detailed Top Command Breakdown

The article then explains each part of the top display.

First line : shows the current system time and the uptime of the machine. Focus on the uptime, as frequent reboots can mask underlying issues.

It also lists the number of logged‑in users (obtainable via who or history) and three load averages (1‑minute, 5‑minute, 15‑minute). Load values should be compared with the number of CPU cores; for a 4‑core machine, a load above 4 indicates heavy load.

Second line : displays the total number of tasks and highlights the count of zombie tasks, which deserve attention.

Third line : provides CPU‑related metrics.

US/SY : percentage of CPU used by user processes vs. system (kernel) processes.

NI : percentage of CPU used by processes with a modified nice value; a high NI may indicate priority‑adjusted workloads.

ID : idle CPU percentage; WA indicates time spent waiting for I/O resources, which can spike under heavy logging.

HI and SI : percentages of CPU time spent handling hardware and software interrupts respectively.

ST : time stolen by the hypervisor in virtualized environments.

The fourth and fifth rows introduce the concepts of buffer and cache . Buffers hold data pending transfer between subsystems with mismatched speeds, while cache stores results (e.g., database query results) for faster subsequent access. Excessive swapping (SWAP) indicates insufficient RAM.

Column explanations (as shown in the top header): PID (process ID), USER (owner), PR (priority), VIRT (virtual memory), RES (resident memory), SHR (shared memory). Note that RES reflects the actual physical memory used by the process; the true private memory footprint is RES‑SHR.

Understanding these fields enables engineers to quickly locate resource‑intensive processes, diagnose lock contention via thread dumps, and take corrective actions such as code optimization or configuration tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Thread Dump Operations Performance Monitoring linux CPU top

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.