Fundamentals 38 min read

Master Linux Kernel Deadlock Detection to Truly Understand Synchronization

This article explains the four necessary conditions for Linux deadlocks, demonstrates each with concrete pthread examples, reviews kernel lock types, introduces detection tools such as Lockdep, gdb, pstack and ftrace, and walks through a real‑world cluster case study with step‑by‑step analysis and remediation.

Deepin Linux
Deepin Linux
Deepin Linux
Master Linux Kernel Deadlock Detection to Truly Understand Synchronization

Understanding Linux Deadlocks

What is a deadlock?

A deadlock occurs when two or more processes or threads wait indefinitely for resources held by each other, so none can proceed without external intervention.

Necessary conditions

Four conditions must hold simultaneously for a deadlock to arise:

Mutual exclusion : a resource can be used by only one thread at a time. The following code shows a pthread_mutex_t guaranteeing exclusive access to a critical section.

#include <stdio.h>
#include <pthread.h>

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

void* thread_function(void* arg) {
    pthread_mutex_lock(&mutex);
    printf("Thread enters critical section
");
    sleep(1);
    pthread_mutex_unlock(&mutex);
    return NULL;
}

int main() {
    pthread_t thread;
    pthread_create(&thread, NULL, thread_function, NULL);
    pthread_mutex_lock(&mutex);
    printf("Main thread enters critical section
");
    pthread_mutex_unlock(&mutex);
    pthread_join(thread, NULL);
    return 0;
}

Hold‑and‑wait : a thread holds one lock while requesting another. Example with two mutexes demonstrates the condition.

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;

void* thread_function(void* arg) {
    pthread_mutex_lock(&mutex1);
    printf("Thread holds mutex1
");
    sleep(1);
    pthread_mutex_lock(&mutex2);
    printf("Thread acquired mutex2
");
    pthread_mutex_unlock(&mutex2);
    pthread_mutex_unlock(&mutex1);
    return NULL;
}

int main() {
    pthread_t thread;
    pthread_create(&thread, NULL, thread_function, NULL);
    pthread_join(thread, NULL);
    return 0;
}

No preemption : a held resource cannot be forcibly taken away; only the owning thread can release it.

Circular wait : threads form a cycle of waiting. The following three‑thread example creates such a cycle.

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

pthread_mutex_t mutexA = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutexB = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutexC = PTHREAD_MUTEX_INITIALIZER;

void* thread1(void* arg) {
    pthread_mutex_lock(&mutexA);
    printf("Thread 1 holds mutexA
");
    sleep(1);
    pthread_mutex_lock(&mutexB);
    printf("Thread 1 acquired mutexB
");
    pthread_mutex_unlock(&mutexB);
    pthread_mutex_unlock(&mutexA);
    return NULL;
}

void* thread2(void* arg) {
    pthread_mutex_lock(&mutexB);
    printf("Thread 2 holds mutexB
");
    sleep(1);
    pthread_mutex_lock(&mutexC);
    printf("Thread 2 acquired mutexC
");
    pthread_mutex_unlock(&mutexC);
    pthread_mutex_unlock(&mutexB);
    return NULL;
}

void* thread3(void* arg) {
    pthread_mutex_lock(&mutexC);
    printf("Thread 3 holds mutexC
");
    sleep(1);
    pthread_mutex_lock(&mutexA);
    printf("Thread 3 acquired mutexA
");
    pthread_mutex_unlock(&mutexA);
    pthread_mutex_unlock(&mutexC);
    return NULL;
}

int main() {
    pthread_t t1, t2, t3;
    pthread_create(&t1, NULL, thread1, NULL);
    pthread_create(&t2, NULL, thread2, NULL);
    pthread_create(&t3, NULL, thread3, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    pthread_join(t3, NULL);
    return 0;
}

When all four conditions are present, a deadlock can stall processes, cause service outages, and generate abnormal system load.

Kernel deadlock detection mechanisms

Lock types

Linux kernel provides several lock primitives:

Spinlock : busy‑wait for very short critical sections; unsuitable for long waits because it consumes CPU cycles.

Mutex : puts the thread to sleep while waiting, appropriate for longer critical sections.

Read‑write lock : allows concurrent reads but exclusive writes, useful for read‑heavy workloads.

Key detection tools

Lockdep : tracks lock acquisition order at runtime, builds a lock‑class dependency graph, and warns when a new edge would create a cycle. Enable it by setting CONFIG_PROVE_LOCKING=y and CONFIG_DEBUG_LOCKDEP=y in the kernel configuration.

gdb : attach to a running process (e.g., gdb -p 1234) and inspect thread stacks with info threads and bt to see which locks each thread is waiting for.

pstack : quickly prints stack traces of all threads in a process ( pstack 5678).

Ftrace : dynamic kernel tracer. Enable the function‑graph tracer and start tracing:

echo function_graph > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on

Then examine /sys/kernel/debug/tracing/trace for lock acquisition sequences.

Banker’s algorithm

The Banker’s algorithm models resource allocation with four matrices/vectors:

Available : currently free instances of each resource.

Max : maximum demand of each process.

Allocation : resources currently allocated to each process.

Need : remaining resources each process requires (Max − Allocation).

Safety check: simulate granting resources to a process whose Need ≤ Work (initially Available). If all processes can finish, the system is in a safe state. When a request arrives, the algorithm verifies that granting it keeps the system safe; otherwise the request is denied.

Practical deadlock diagnosis process

General workflow

Typical symptoms include long‑running processes, high CPU usage, and processes stuck in D (uninterruptible sleep) state. The workflow is:

Examine system logs for error messages.

Use top / htop to spot abnormal CPU usage.

Identify D‑state processes with ps (e.g., ps -eo pid,comm,wchan,state | awk '$4=="D"').

Inspect held resources with lslocks or lsof.

Apply kernel‑level tracing (Lockdep, ftrace) to capture lock dependency graphs.

Tool‑driven case study

In a distributed file‑storage cluster, top showed kworker/0:1H and jbd2/sda1-8 consuming ~99 % CPU while in D state. ps confirmed their PIDs. lslocks revealed a file lock held by the network service, and lsof showed both processes accessing /dev/sda1 metadata files. Enabling ftrace on EXT4 journal functions captured the exact sequence:

# tracer: function_graph
...
 kworker/0:1H-1234 [000] d... ext4_journal_get_write_access: (entry)
 kworker/0:1H-1234 [000] d... ext4_journal_start: (entry)
 jbd2/sda1-8-2345 [001] d... ext4_journal_start: (entry)
 jbd2/sda1-8-2345 [001] d... ext4_journal_get_write_access: (entry)
...

The trace shows the kworker acquiring the data‑block lock then attempting the journal lock, while jbd2 already holds the journal lock and waits for the data‑block lock, forming a circular wait.

Common deadlock scenarios

Typical sources include driver code mixing interrupt‑context and process‑context lock usage, file‑system operations that hold journal locks while accessing data blocks, and memory‑management paths that lock allocation structures.

End‑to‑End case: Resolving a cluster‑wide deadlock

Background

A CentOS 7 cluster (kernel 3.10.0‑1160…) experienced EXT4 I/O errors, journal aborts, and CPU saturation.

Investigation

top

identified kworker/0:1H and jbd2/sda1-8 in D state. ps and lslocks confirmed they were contending for the same filesystem metadata. Ftrace revealed a circular wait: the kworker held the data‑block lock and tried to acquire the journal lock, while jbd2 held the journal lock and waited for the data‑block lock.

Fixes

Reorder lock acquisition so all code obtains the journal lock before the data‑block lock, breaking the circular‑wait condition.

Add a lock‑timeout using pthread_mutex_timedlock to avoid indefinite blocking.

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

int main() {
    struct timespec timeout;
    clock_gettime(CLOCK_REALTIME, &timeout);
    timeout.tv_sec += 10; // 10‑second timeout
    int ret = pthread_mutex_timedlock(&mutex, &timeout);
    if (ret == 0) {
        printf("Lock acquired
");
        pthread_mutex_unlock(&mutex);
    } else if (ret == ETIMEDOUT) {
        printf("Lock timeout, aborting
");
    } else {
        perror("pthread_mutex_timedlock");
        exit(EXIT_FAILURE);
    }
    return 0;
}

Reduce lock hold time by separating independent file operations and adding caching to lower contention.

Extensive load testing after applying these changes showed normal response times, stable CPU usage, and no further deadlock warnings.

DebuggingDeadlockKernelLinuxSynchronizationLockdep
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.