Operations 11 min read

Why Did My CPU Hit 100%? Uncovering Hidden Linux Processes & Nice Value

An Alibaba tech support engineer investigates a server whose CPU appeared fully utilized, discovers the misleading 'ni' metric, reproduces the issue with high‑nice loops, uncovers hidden mining processes via ld.so.preload, and explains how to detect and mitigate such stealthy CPU hogs.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Why Did My CPU Hit 100%? Uncovering Hidden Linux Processes & Nice Value

Case Overview

An Alibaba technical support engineer was asked to explain why a customer's server showed 100% CPU usage according to top. The top display lists eight CPU metrics: us, sy, ni, id, wa, hi, si, and st. The sum of all eight should be 100%.

The ni metric (high‑nice user‑space processes) was the dominant value, while id and wa (idle times) summed to zero, indicating a fully busy CPU.

Understanding the ni Metric

In Linux, processes have a nice value that determines their scheduling priority. A higher nice value means lower priority. The ni column aggregates CPU usage of all processes whose nice value is greater than 0.

Reproducing the Symptom

The engineer wrote a simple infinite loop program and examined its compiled assembly:

00000000004004ed <main>:
4004ed: 55 push %rbp
4004ee: 48 89 e5 mov %rsp,%rbp
4004f1: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
4004f8: 83 45 fc 01 addl $0x1,-0x4(%rbp)
4004fc: eb fa jmp 4004f8 <main+0xb>
4004fe: 66 90 xchg %ax,%ax

Running this program with a high nice value (e.g., 19) on a 16‑core server shows the process consuming a large portion of the ni column, while the total CPU usage reported by top remains 100%.

The per‑core breakdown confirms that the CPU is saturated even though the sum of process usages appears lower than the theoretical maximum (1600%).

Client Disagreement and Further Investigation

The client insisted the issue persisted before a reboot, demanding proof of the CPU hog. After the server rebooted, the symptom vanished, but the client remained unsatisfied.

Discovering Hidden Mining Processes

Using system logs, the engineer pinpointed the start time of the CPU spike (April 29, 06:40). Two configuration files created a minute earlier referenced suspicious libraries libxmr‑stak‑c.a and libxmr‑stak‑backend.a, known components of the Monero mining software.

Using ld.so.preload to Hide Processes

The engineer found that at 06:39 a library was added to ld.so.preload. This forces the library to be loaded before all others, allowing it to hook functions like readdir and filter /proc entries. Consequently, top and ps displayed sanitized process lists, hiding the mining processes.

Evidence and Mitigation

Reading /proc/<pid>/maps of a bash process showed the injected libjdk library present in every process address space. The library’s strings revealed hooks for directory reading, confirming the stealth technique.

Takeaways

The case illustrates how missing reproducibility and a demanding client can complicate troubleshooting, but systematic analysis—examining CPU metrics, reproducing with controlled workloads, and inspecting preload hooks—can uncover sophisticated hidden workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DebuggingperformancelinuxCPUniceHidden Processes
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.