Fundamentals 10 min read

How to Uncover Hidden Kernel Bugs: A Step‑by‑Step Linux Debugging Guide

This article walks through a real‑world Linux kernel bug investigation, showing how to detect a frozen process, use tools like ps, strace, pstack, and the /proc filesystem to trace system calls, identify the blocking kernel function, and pinpoint the root cause.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How to Uncover Hidden Kernel Bugs: A Step‑by‑Step Linux Debugging Guide

Problem detection

A server monitoring alarm reported that a process was stuck. Traditional debugging with gdb and inspection of log files did not reveal any clues.

Determine whether the process is in user or kernel mode

Running top for the PID showed CPU = 0 % . A CPU‑bound user‑space loop would consume CPU, so the zero usage suggested the process was sleeping in the kernel.

Use ps to inspect state and waiting channel

The ps -o pid,stat,wchan,cmd -p $PID output displayed a state of D and a WCHAN value of rpc_wa, indicating an uninterruptible sleep while waiting on an RPC call.

ps output showing CPU 0% and state D
ps output showing CPU 0% and state D

Read the full waiting channel from /proc/&lt;PID&gt;/wchan</h2><p>Because the column is truncated, the full value can be obtained with:</p><pre><code>cat /proc/$PID/wchan The file contained rpc_wait , confirming that the process was blocked inside the kernel’s RPC subsystem. Identify the system call being executed The kernel records the current system call in /proc/&lt;PID&gt;/syscall . Reading it yields a line such as: <code>262 0x7f... 0x0 0x0 0x0 0x0 0x0</code> The first number (262) is the syscall ID. On a 64‑bit Linux system the mapping is defined in /usr/include/asm/unistd_64.h . Searching that header shows: <code>#define __NR_newfstatat 262</code> Thus the process was executing the newfstatat system call (also known as fstatat ), which retrieves file metadata. Inspect the kernel call stack The per‑process kernel stack is exposed as /proc/&lt;PID&gt;/stack . Displaying it with: <code>cat /proc/$PID/stack</code> produces a trace similar to: The stack frames contain a series of NFS‑related functions (e.g., nfs_file_fstat , nfs_getattr ) that ultimately call rpc_wait . This confirms that the newfstatat request was targeting a file on an NFS mount and that the process was blocked waiting for the RPC response. Interpretation of process state The D state reported by ps means *uninterruptible sleep* (a kernel‑mode wait that cannot be terminated by signals). While in this state the process does not consume CPU and cannot be killed until the underlying I/O completes or the kernel aborts the wait. Conclusion The investigation demonstrates how Linux’s /proc pseudo‑filesystem provides detailed runtime information: process state, waiting channel, current system call, and kernel stack. By combining top , ps , and simple cat reads of /proc files, one can locate the exact kernel function and system call responsible for a hang without resorting to heavyweight debuggers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DebuggingLinuxNFSprocess-stateprocfssystem-call
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.