Understanding Linux’s Hung Task Mechanism: Detecting D‑State Processes
The article explains how the Linux kernel identifies processes stuck in the uninterruptible D state, describes the hung‑task detection code, shows the watchdog thread and related functions, and provides a Raspberry Pi example that triggers and logs a hung‑task warning.
Linux processes can be in several states such as TASK_RUNNING, EXIT_DEAD, and TASK_INTERRUPTIBLE. One special state is TASK_UNINTERRUPTIBLE (the D state), where a process does not receive signals and can only be woken by wake_up. This state is entered, for example, when a mutex is held or an I/O wait occurs. Because a process may remain in D state for a long time if the I/O device fails or a deadlock happens, the kernel provides a “hung‑task” mechanism to detect and report such cases.
Hung‑Task Mechanism Overview
The mechanism is implemented in kernel/hung_task.c (examined here for Linux 4.1.15). During kernel initialization, hung_task_init() registers a panic notifier and starts a kernel thread named khungtaskd that runs the watchdog() function.
static int __init hung_task_init(void) {
atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");
return 0;
}
subsys_initcall(hung_task_init);The panic notifier sets did_panic when the kernel crashes, allowing the hung‑task code to stop further checks.
Watchdog Thread
The watchdog thread runs in an infinite loop, sleeping for sysctl_hung_task_timeout_secs (default 120 s) and then calling check_hung_uninterruptible_tasks(). It also respects a reset flag that can skip a monitoring round.
static int watchdog(void *dummy) {
set_user_nice(current, 0);
for (;;) {
unsigned long timeout = sysctl_hung_task_timeout_secs;
while (schedule_timeout_interruptible(timeout_jiffies(timeout)))
timeout = sysctl_hung_task_timeout_secs;
if (atomic_xchg(&reset_hung_task, 0))
continue;
check_hung_uninterruptible_tasks(timeout);
}
return 0;
}The reset flag is set by reset_hung_task_detector(), which simply does atomic_set(&reset_hung_task, 1) and is exported for other kernel code.
void reset_hung_task_detector(void) {
atomic_set(&reset_hung_task, 1);
}
EXPORT_SYMBOL_GPL(reset_hung_task_detector);Scanning Uninterruptible Tasks
check_hung_uninterruptible_tasks(timeout)iterates over all processes under RCU protection, limiting the scan to HUNG_TASK_BATCHING tasks per batch and to a maximum count defined by sysctl_hung_task_check_count. For each task whose state equals TASK_UNINTERRUPTIBLE, it calls check_hung_task().
static void check_hung_uninterruptible_tasks(unsigned long timeout) {
int max_count = sysctl_hung_task_check_count;
int batch_count = HUNG_TASK_BATCHING;
struct task_struct *g, *t;
if (test_taint(TAINT_DIE) || did_panic)
return;
rcu_read_lock();
for_each_process_thread(g, t) {
if (!max_count--)
goto unlock;
if (!--batch_count) {
batch_count = HUNG_TASK_BATCHING;
if (!rcu_lock_break(g, t))
goto unlock;
}
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, timeout);
}
unlock:
rcu_read_unlock();
}Per‑Task Check
check_hung_task()first computes the total number of voluntary and involuntary context switches ( t->nvcsw + t->nivcsw). It skips frozen tasks and tasks that have never been scheduled. If the current switch count equals the value saved from the previous check, the task is considered hung. The function then emits a warning (subject to sysctl_hung_task_warnings) and optionally triggers a panic if sysctl_hung_task_panic is set.
static void check_hung_task(struct task_struct *t, unsigned long timeout) {
unsigned long switch_count = t->nvcsw + t->nivcsw;
if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP)))
return;
if (unlikely(!switch_count))
return;
if (switch_count != t->last_switch_count) {
t->last_switch_count = switch_count;
return;
}
trace_sched_process_hang(t);
if (!sysctl_hung_task_warnings)
return;
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings--;
pr_err("INFO: task %s:%d blocked for more than %ld seconds.
",
t->comm, t->pid, timeout);
pr_err(" %s %s %.*s
", print_tainted(), init_utsname()->release,
(int)strcspn(init_utsname()->version, " "), init_utsname()->version);
pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\" disables this message.
");
sched_show_task(t);
debug_show_held_locks(t);
touch_nmi_watchdog();
if (sysctl_hung_task_panic) {
trigger_all_cpu_backtrace();
panic("hung_task: blocked tasks");
}
}Example Demonstration
The article builds a simple kernel module on a Raspberry Pi (Linux 4.1.15) that creates a mutex and locks it twice in the module’s init function, deliberately causing a deadlock. The module is inserted with insmod dlock.ko, after which ps shows the process in state D (disk sleep) and /proc/521/status confirms the D state.
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/init.h>
DEFINE_MUTEX(dlock);
static int __init dlock_init(void) {
mutex_lock(&dlock);
mutex_lock(&dlock); // deadlock, puts task in D state
return 0;
}
static void __exit dlock_exit(void) { }
module_init(dlock_init);
module_exit(dlock_exit);
MODULE_LICENSE("GPL");After two minutes the kernel prints repeated warnings such as:
INFO: task insmod:521 blocked for more than 120 seconds.
Tainted: G O 4.1.15 #5
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...These logs include the task name, PID, timeout, kernel taint information, stack trace, and register dump. If sysctl_hung_task_panic is enabled, the kernel will panic instead of merely logging.
Conclusion
Hung‑task detection is a valuable tool for kernel developers, especially when debugging driver‑level deadlocks that manifest as D‑state processes. By capturing the detailed warning output, developers can quickly locate the offending code and resolve the issue.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
