Master Linux Kernel Deadlock Detection to Truly Understand Synchronization
This article explains the four necessary conditions for Linux deadlocks, demonstrates each with concrete pthread examples, reviews kernel lock types, introduces detection tools such as Lockdep, gdb, pstack and ftrace, and walks through a real‑world cluster case study with step‑by‑step analysis and remediation.
Understanding Linux Deadlocks
What is a deadlock?
A deadlock occurs when two or more processes or threads wait indefinitely for resources held by each other, so none can proceed without external intervention.
Necessary conditions
Four conditions must hold simultaneously for a deadlock to arise:
Mutual exclusion : a resource can be used by only one thread at a time. The following code shows a pthread_mutex_t guaranteeing exclusive access to a critical section.
#include <stdio.h>
#include <pthread.h>
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void* thread_function(void* arg) {
pthread_mutex_lock(&mutex);
printf("Thread enters critical section
");
sleep(1);
pthread_mutex_unlock(&mutex);
return NULL;
}
int main() {
pthread_t thread;
pthread_create(&thread, NULL, thread_function, NULL);
pthread_mutex_lock(&mutex);
printf("Main thread enters critical section
");
pthread_mutex_unlock(&mutex);
pthread_join(thread, NULL);
return 0;
}Hold‑and‑wait : a thread holds one lock while requesting another. Example with two mutexes demonstrates the condition.
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;
void* thread_function(void* arg) {
pthread_mutex_lock(&mutex1);
printf("Thread holds mutex1
");
sleep(1);
pthread_mutex_lock(&mutex2);
printf("Thread acquired mutex2
");
pthread_mutex_unlock(&mutex2);
pthread_mutex_unlock(&mutex1);
return NULL;
}
int main() {
pthread_t thread;
pthread_create(&thread, NULL, thread_function, NULL);
pthread_join(thread, NULL);
return 0;
}No preemption : a held resource cannot be forcibly taken away; only the owning thread can release it.
Circular wait : threads form a cycle of waiting. The following three‑thread example creates such a cycle.
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
pthread_mutex_t mutexA = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutexB = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutexC = PTHREAD_MUTEX_INITIALIZER;
void* thread1(void* arg) {
pthread_mutex_lock(&mutexA);
printf("Thread 1 holds mutexA
");
sleep(1);
pthread_mutex_lock(&mutexB);
printf("Thread 1 acquired mutexB
");
pthread_mutex_unlock(&mutexB);
pthread_mutex_unlock(&mutexA);
return NULL;
}
void* thread2(void* arg) {
pthread_mutex_lock(&mutexB);
printf("Thread 2 holds mutexB
");
sleep(1);
pthread_mutex_lock(&mutexC);
printf("Thread 2 acquired mutexC
");
pthread_mutex_unlock(&mutexC);
pthread_mutex_unlock(&mutexB);
return NULL;
}
void* thread3(void* arg) {
pthread_mutex_lock(&mutexC);
printf("Thread 3 holds mutexC
");
sleep(1);
pthread_mutex_lock(&mutexA);
printf("Thread 3 acquired mutexA
");
pthread_mutex_unlock(&mutexA);
pthread_mutex_unlock(&mutexC);
return NULL;
}
int main() {
pthread_t t1, t2, t3;
pthread_create(&t1, NULL, thread1, NULL);
pthread_create(&t2, NULL, thread2, NULL);
pthread_create(&t3, NULL, thread3, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
pthread_join(t3, NULL);
return 0;
}When all four conditions are present, a deadlock can stall processes, cause service outages, and generate abnormal system load.
Kernel deadlock detection mechanisms
Lock types
Linux kernel provides several lock primitives:
Spinlock : busy‑wait for very short critical sections; unsuitable for long waits because it consumes CPU cycles.
Mutex : puts the thread to sleep while waiting, appropriate for longer critical sections.
Read‑write lock : allows concurrent reads but exclusive writes, useful for read‑heavy workloads.
Key detection tools
Lockdep : tracks lock acquisition order at runtime, builds a lock‑class dependency graph, and warns when a new edge would create a cycle. Enable it by setting CONFIG_PROVE_LOCKING=y and CONFIG_DEBUG_LOCKDEP=y in the kernel configuration.
gdb : attach to a running process (e.g., gdb -p 1234) and inspect thread stacks with info threads and bt to see which locks each thread is waiting for.
pstack : quickly prints stack traces of all threads in a process ( pstack 5678).
Ftrace : dynamic kernel tracer. Enable the function‑graph tracer and start tracing:
echo function_graph > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_onThen examine /sys/kernel/debug/tracing/trace for lock acquisition sequences.
Banker’s algorithm
The Banker’s algorithm models resource allocation with four matrices/vectors:
Available : currently free instances of each resource.
Max : maximum demand of each process.
Allocation : resources currently allocated to each process.
Need : remaining resources each process requires (Max − Allocation).
Safety check: simulate granting resources to a process whose Need ≤ Work (initially Available). If all processes can finish, the system is in a safe state. When a request arrives, the algorithm verifies that granting it keeps the system safe; otherwise the request is denied.
Practical deadlock diagnosis process
General workflow
Typical symptoms include long‑running processes, high CPU usage, and processes stuck in D (uninterruptible sleep) state. The workflow is:
Examine system logs for error messages.
Use top / htop to spot abnormal CPU usage.
Identify D‑state processes with ps (e.g., ps -eo pid,comm,wchan,state | awk '$4=="D"').
Inspect held resources with lslocks or lsof.
Apply kernel‑level tracing (Lockdep, ftrace) to capture lock dependency graphs.
Tool‑driven case study
In a distributed file‑storage cluster, top showed kworker/0:1H and jbd2/sda1-8 consuming ~99 % CPU while in D state. ps confirmed their PIDs. lslocks revealed a file lock held by the network service, and lsof showed both processes accessing /dev/sda1 metadata files. Enabling ftrace on EXT4 journal functions captured the exact sequence:
# tracer: function_graph
...
kworker/0:1H-1234 [000] d... ext4_journal_get_write_access: (entry)
kworker/0:1H-1234 [000] d... ext4_journal_start: (entry)
jbd2/sda1-8-2345 [001] d... ext4_journal_start: (entry)
jbd2/sda1-8-2345 [001] d... ext4_journal_get_write_access: (entry)
...The trace shows the kworker acquiring the data‑block lock then attempting the journal lock, while jbd2 already holds the journal lock and waits for the data‑block lock, forming a circular wait.
Common deadlock scenarios
Typical sources include driver code mixing interrupt‑context and process‑context lock usage, file‑system operations that hold journal locks while accessing data blocks, and memory‑management paths that lock allocation structures.
End‑to‑End case: Resolving a cluster‑wide deadlock
Background
A CentOS 7 cluster (kernel 3.10.0‑1160…) experienced EXT4 I/O errors, journal aborts, and CPU saturation.
Investigation
topidentified kworker/0:1H and jbd2/sda1-8 in D state. ps and lslocks confirmed they were contending for the same filesystem metadata. Ftrace revealed a circular wait: the kworker held the data‑block lock and tried to acquire the journal lock, while jbd2 held the journal lock and waited for the data‑block lock.
Fixes
Reorder lock acquisition so all code obtains the journal lock before the data‑block lock, breaking the circular‑wait condition.
Add a lock‑timeout using pthread_mutex_timedlock to avoid indefinite blocking.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int main() {
struct timespec timeout;
clock_gettime(CLOCK_REALTIME, &timeout);
timeout.tv_sec += 10; // 10‑second timeout
int ret = pthread_mutex_timedlock(&mutex, &timeout);
if (ret == 0) {
printf("Lock acquired
");
pthread_mutex_unlock(&mutex);
} else if (ret == ETIMEDOUT) {
printf("Lock timeout, aborting
");
} else {
perror("pthread_mutex_timedlock");
exit(EXIT_FAILURE);
}
return 0;
}Reduce lock hold time by separating independent file operations and adding caching to lower contention.
Extensive load testing after applying these changes showed normal response times, stable CPU usage, and no further deadlock warnings.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
