Backend Development 14 min read

Understanding the Thundering Herd Problem in Linux and Nginx

The article explains the thundering herd problem where many processes wake for a single event, describes Linux’s kernel fixes for accept() and partial epoll solutions, and details how Nginx avoids the issue using a custom inter‑process accept mutex and lock design.

Didi Tech
Didi Tech
Didi Tech
Understanding the Thundering Herd Problem in Linux and Nginx

Preface

The article records the author’s study of Nginx’s use of common modules and the thundering herd problem, based on collected online material and practical experience.

Conclusion

Both multi‑process and multi‑thread models suffer from the thundering herd effect; this article analyses it using a multi‑process scenario.

After Linux 2.6 the thundering herd caused by the accept system call was solved (provided no select/poll/epoll is used). Linux has partially solved the epoll herd effect (the epoll created before fork), but the problem still exists for epoll created after fork. Nginx solves the problem with its own mutex implementation.

What is the thundering herd?

The thundering herd (thundering herd) occurs when many processes or threads are blocked waiting for the same event. When the event occurs, all waiting entities are awakened, but only one can acquire the resource and handle the event; the others go back to sleep, wasting CPU cycles.

What does the herd consume?

Excessive context switches and scheduling in the Linux kernel, leading to high CPU overhead.

Locking overhead to ensure only one process/thread accesses the resource.

Linux solution – accept()

Before Linux 2.6, all processes waiting on the same socket were placed on the same wait queue; a new connection woke up every waiting process. Since Linux 2.6 a flag WQ_FLAG_EXCLUSIVE is set, so only one process is awakened.

Key kernel code (simplified):

struct sock *
inet_csk_accept(struct sock *sk, int flags, int *err)
{
    ...
    error = inet_csk_wait_for_connect(sk, timeo);
    ...
}

static int
inet_csk_wait_for_connect(struct sock *sk, long timeo)
{
    ...
    for (;;) {
        // only one process will be woken
        // non‑exclusive elements are added to the head of the wait queue,
        // exclusive elements are added after all non‑exclusive ones
        prepare_to_wait_exclusive(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
    }
    ...
}

void
prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
{
    unsigned long flags;
    // set the exclusive flag – only one process will be woken
    wait->flags |= WQ_FLAG_EXCLUSIVE;
    spin_lock_irqsave(&q->lock, flags);
    if (list_empty(&wait->task_list))
        __add_wait_queue_tail(q, wait);
    set_current_state(state);
    spin_unlock_irqrestore(&q->lock, flags);
}

Linux solution – epoll

When using I/O multiplexing (select, poll, epoll, kqueue), the herd effect appears in two cases:

epoll created before fork – all processes share the same epoll red‑black tree, and the kernel can avoid waking all processes.

epoll created after fork – each process has its own epoll structure, so the herd re‑appears.

Because the kernel cannot know which process should handle a readable event, it may wake the wrong process. The usual remedy is to let each worker maintain its own epoll instance for read/write events.

Nginx solution – lock design

Nginx implements its own inter‑process lock. When atomic operations are available, the lock is stored in shared memory; otherwise a file lock is used.

typedef struct {
#if (NGX_HAVE_ATOMIC_OPS)
    ngx_atomic_t *lock;
#else
    ngx_fd_t fd;
    u_char *name;
#endif
} ngx_shmtx_t;

Creating an atomic lock (when supported):

ngx_int_t
ngx_shmtx_create(ngx_shmtx_t *mtx, void *addr, u_char *name)
{
    mtx->lock = addr;
    return NGX_OK;
}

Atomic compare‑and‑set macro (using OSAtomic on supported platforms):

#define ngx_atomic_cmp_set(lock, old, new) \
    OSAtomicCompareAndSwap32Barrier(old, new, (int32_t *) lock)

When the OS does not provide an atomic primitive, Nginx falls back to inline assembly:

static ngx_inline ngx_atomic_uint_t
ngx_atomic_cmp_set(ngx_atomic_t *lock, ngx_atomic_uint_t old, ngx_atomic_uint_t set)
{
    u_char res;
    __asm__ volatile (
        "cmpxchgl %3, %1; "
        "sete %0; "
        : "=a" (res)
        : "m" (*lock), "a" (old), "r" (set)
        : "cc", "memory");
    return res;
}

Unlocking is simply:

#define ngx_shmtx_unlock(mtx) \
    (void) ngx_atomic_cmp_set((mtx)->lock, ngx_pid, 0)

Nginx accept‑mutex variables

if (ccf->master && ccf->worker_processes > 1 && ecf->accept_mutex) {
    ngx_use_accept_mutex = 1;
    ngx_accept_mutex_held = 0;
    ngx_accept_mutex_delay = ecf->accept_mutex_delay;
} else {
    ngx_use_accept_mutex = 0;
}

Key variables:

ngx_use_accept_mutex – indicates whether Nginx should use an accept mutex.

ngx_accept_mutex_held – whether the current process holds the lock.

ngx_accept_mutex_delay – retry interval when lock acquisition fails.

When the lock is held, Nginx postpones processing of accept events (sets NGX_POST_EVENTS ) until the lock is released, avoiding the herd.

if (ngx_use_accept_mutex) {
    if (ngx_accept_disabled > 0) {
        ngx_accept_disabled--;
    } else {
        if (ngx_trylock_accept_mutex(cycle) == NGX_ERROR) {
            return;
        }
        if (ngx_accept_mutex_held) {
            flags |= NGX_POST_EVENTS;
        } else {
            if (timer == NGX_TIMER_INFINITE || timer > ngx_accept_mutex_delay) {
                timer = ngx_accept_mutex_delay;
            }
        }
    }
}

The function that tries to acquire the accept mutex:

ngx_int_t
ngx_trylock_accept_mutex(ngx_cycle_t *cycle)
{
    // try to get the lock
    if (ngx_shmtx_trylock(&ngx_accept_mutex)) {
        if (ngx_accept_mutex_held && ngx_accept_events == 0 && !(ngx_event_flags & NGX_USE_RTSIG_EVENT)) {
            return NGX_OK;
        }
        // enable listening sockets
        if (ngx_enable_accept_events(cycle) == NGX_ERROR) {
            ngx_shmtx_unlock(&ngx_accept_mutex);
            return NGX_ERROR;
        }
        ngx_accept_events = 0;
        ngx_accept_mutex_held = 1;
        return NGX_OK;
    }
    // lock not obtained – disable listening sockets
    if (ngx_accept_mutex_held) {
        if (ngx_disable_accept_events(cycle) == NGX_ERROR) {
            return NGX_ERROR;
        }
        ngx_accept_mutex_held = 0;
    }
    return NGX_OK;
}

By acquiring the mutex before calling accept() , only one worker wakes up to handle the new connection; the others remove the socket from their epoll list, preventing unnecessary wake‑ups and achieving better load balancing.

ConcurrencylockingNginxLinux kernelepollthundering herd
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.