How to Uncover Hidden Kernel Bugs: A Step‑by‑Step Linux Debugging Guide
This article walks through a real‑world Linux kernel bug investigation, showing how to detect a frozen process, use tools like ps, strace, pstack, and the /proc filesystem to trace system calls, identify the blocking kernel function, and pinpoint the root cause.
Problem detection
A server monitoring alarm reported that a process was stuck. Traditional debugging with gdb and inspection of log files did not reveal any clues.
Determine whether the process is in user or kernel mode
Running top for the PID showed CPU = 0 % . A CPU‑bound user‑space loop would consume CPU, so the zero usage suggested the process was sleeping in the kernel.
Use ps to inspect state and waiting channel
The ps -o pid,stat,wchan,cmd -p $PID output displayed a state of D and a WCHAN value of rpc_wa, indicating an uninterruptible sleep while waiting on an RPC call.
Read the full waiting channel from /proc/<PID>/wchan</h2><p>Because the column is truncated, the full value can be obtained with:</p><pre><code>cat /proc/$PID/wchan The file contained rpc_wait , confirming that the process was blocked inside the kernel’s RPC subsystem. Identify the system call being executed The kernel records the current system call in /proc/<PID>/syscall . Reading it yields a line such as: <code>262 0x7f... 0x0 0x0 0x0 0x0 0x0</code> The first number (262) is the syscall ID. On a 64‑bit Linux system the mapping is defined in /usr/include/asm/unistd_64.h . Searching that header shows: <code>#define __NR_newfstatat 262</code> Thus the process was executing the newfstatat system call (also known as fstatat ), which retrieves file metadata. Inspect the kernel call stack The per‑process kernel stack is exposed as /proc/<PID>/stack . Displaying it with: <code>cat /proc/$PID/stack</code> produces a trace similar to: The stack frames contain a series of NFS‑related functions (e.g., nfs_file_fstat , nfs_getattr ) that ultimately call rpc_wait . This confirms that the newfstatat request was targeting a file on an NFS mount and that the process was blocked waiting for the RPC response. Interpretation of process state The D state reported by ps means *uninterruptible sleep* (a kernel‑mode wait that cannot be terminated by signals). While in this state the process does not consume CPU and cannot be killed until the underlying I/O completes or the kernel aborts the wait. Conclusion The investigation demonstrates how Linux’s /proc pseudo‑filesystem provides detailed runtime information: process state, waiting channel, current system call, and kernel stack. By combining top , ps , and simple cat reads of /proc files, one can locate the exact kernel function and system call responsible for a hang without resorting to heavyweight debuggers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
