Understanding Linux Deadlocks: Spinlocks, Semaphores, and Built‑in Detection Mechanisms
The article explains what deadlocks are, compares spinlocks and semaphores, describes typical deadlock scenarios in the Linux kernel, and details the built‑in detection mechanisms such as hung‑task (D‑state) and soft‑lockup (R‑state) along with the NMI watchdog for long interrupt disabling.
A deadlock occurs when two or more processes (or threads) wait for each other’s resources, creating a circular wait that cannot progress without external intervention.
Deadlocks only arise when multiple processes or threads communicate, share resources, or interact with interrupts; a single‑threaded program cannot deadlock.
Spinlock
Recursive use: acquiring a spinlock twice in the same thread without releasing it leads to deadlock.
Blocking after acquiring a spinlock: calling functions such as copy_from_user(), copy_to_user(), or kmalloc() that may sleep while holding the lock.
Interrupt handling without disabling interrupts: if an interrupt occurs while a spinlock is held, the interrupt handler may spin waiting for the same lock.
Shared resources between interrupt context and process context can cause deadlock similar to the previous case.
Shared resources between interrupt bottom halves and process context can also cause deadlock.
Spinlock Three States
Single‑CPU, non‑preemptible kernel: all spinlock operations are no‑ops; deadlock cannot occur because there is no concurrent execution.
Single‑CPU, preemptible kernel: the kernel disables preemption while the lock is held. If a thread blocks or schedules while holding the lock, the scheduler may run other tasks, and when the blocked thread later tries to reacquire the lock it will spin forever, producing a deadlock.
Multi‑CPU, preemptible kernel (SMP): the lock truly serializes access across CPUs; deadlock can arise if a thread holding a spinlock blocks or if an interrupt on another CPU attempts to acquire the same lock.
Semaphore
Recursive use: acquiring a semaphore twice without releasing it causes the thread to sleep indefinitely.
Blocking after acquiring a semaphore: other threads waiting for the same semaphore also sleep, potentially leading to deadlock.
Interrupt context acquiring a semaphore: since semaphores sleep when unavailable, an interrupt handler that tries to acquire one will deadlock because interrupts cannot sleep.
Bottom‑half interrupt acquiring a semaphore: allowed because bottom halves may sleep, so this does not cause deadlock.
Two processes each holding a semaphore the other needs (circular wait) results in classic deadlock.
D‑State Deadlock Detection
Linux provides the hung_task mechanism to detect tasks stuck in TASK_UNINTERRUPTIBLE (D‑state) for longer than the configurable timeout (default 120 seconds). A kernel thread khungtaskd runs every timeout interval, scans the task list, and prints stack traces for tasks that have not switched.
static int __init hung_task_init(void)
{
atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");
return 0;
}
static int watchdog(void *dummy)
{
set_user_nice(current, 0);
for (;;) {
unsigned long timeout = sysctl_hung_task_timeout_secs;
while (schedule_timeout_interruptible(timeout_jiffies(timeout)))
timeout = sysctl_hung_task_timeout_secs;
check_hung_uninterruptible_tasks(timeout);
}
return 0;
}
static void check_hung_uninterruptible_tasks(unsigned long timeout)
{
int max_count = sysctl_hung_task_check_count;
int batch_count = HUNG_TASK_BATCHING;
struct task_struct *g, *t;
rcu_read_lock();
do_each_thread(g, t) {
if (!max_count--)
goto unlock;
if (!--batch_count) {
batch_count = HUNG_TASK_BATCHING;
rcu_lock_break(g, t);
}
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, timeout);
} while_each_thread(g, t);
unlock:
rcu_read_unlock();
}
static void check_hung_task(struct task_struct *t, unsigned long timeout)
{
unsigned long switch_count = t->nvcsw + t->nivcsw;
/* additional analysis omitted for brevity */
}R‑State Deadlock Detection
Tasks that stay in TASK_RUNNING for an extended period (default 60 seconds) without yielding indicate an R‑state deadlock. Linux uses the softlockup mechanism, implemented in softlockup.c, which creates a FIFO watchdog thread per CPU and a timer hook softlockup_tick that checks whether each CPU’s timestamp has been refreshed.
static int __init spawn_softlockup_task(void)
{
void *cpu = (void *)(long)smp_processor_id();
int err;
if (nosoftlockup)
return 0;
err = cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
if (err == NOTIFY_BAD) {
BUG();
return 1;
}
cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
register_cpu_notifier(&cpu_nfb);
atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
return 0;
}
static int cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
{
int hotcpu = (unsigned long)hcpu;
struct task_struct *p;
switch (action) {
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
BUG_ON(per_cpu(softlockup_watchdog, hotcpu));
p = kthread_create(watchdog, hcpu, "watchdog/%d", hotcpu);
if (IS_ERR(p)) {
printk(KERN_ERR "watchdog for %i failed
", hotcpu);
return NOTIFY_BAD;
}
per_cpu(softlockup_touch_ts, hotcpu) = 0;
per_cpu(softlockup_watchdog, hotcpu) = p;
kthread_bind(p, hotcpu);
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
wake_up_process(per_cpu(softlockup_watchdog, hotcpu));
break;
}
return NOTIFY_OK;
}
static int watchdog(void *__bind_cpu)
{
struct sched_param param = {.sched_priority = MAX_RT_PRIO-1};
sched_setscheduler(current, SCHED_FIFO, ¶m);
__touch_softlockup_watchdog();
set_current_state(TASK_INTERRUPTIBLE);
while (!kthread_should_stop()) {
__touch_softlockup_watchdog();
schedule();
if (kthread_should_stop())
break;
set_current_state(TASK_INTERRUPTIBLE);
}
__set_current_state(TASK_RUNNING);
return 0;
}
void softlockup_tick(void)
{
int this_cpu = smp_processor_id();
unsigned long touch_ts = per_cpu(softlockup_touch_ts, this_cpu);
unsigned long print_ts;
struct pt_regs *regs = get_irq_regs();
unsigned long now;
if (!per_cpu(softlockup_watchdog, this_cpu) || softlockup_thresh <= 0)
return;
/* further logic omitted for brevity */
}Long Interrupt‑Disable Detection (NMI Watchdog)
The NMI watchdog uses a hardware timer that must be periodically refreshed; if a CPU stays with interrupts disabled for too long, the watchdog does not receive the refresh and triggers a system reset, providing a simple way to detect prolonged interrupt‑off periods.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
