Backend Development 14 min read

Understanding the Thundering Herd Problem in Linux and Nginx

The article explains the thundering herd problem where many processes wake for a single event, describes Linux’s kernel fixes for accept() and partial epoll solutions, and details how Nginx avoids the issue using a custom inter‑process accept mutex and lock design.

Didi Tech

May 18, 2018

Understanding the Thundering Herd Problem in Linux and Nginx

Preface

The article records the author’s study of Nginx’s use of common modules and the thundering herd problem, based on collected online material and practical experience.

Conclusion

Both multi‑process and multi‑thread models suffer from the thundering herd effect; this article analyses it using a multi‑process scenario.

After Linux 2.6 the thundering herd caused by the accept system call was solved (provided no select/poll/epoll is used). Linux has partially solved the epoll herd effect (the epoll created before fork), but the problem still exists for epoll created after fork. Nginx solves the problem with its own mutex implementation.

What is the thundering herd?

The thundering herd (thundering herd) occurs when many processes or threads are blocked waiting for the same event. When the event occurs, all waiting entities are awakened, but only one can acquire the resource and handle the event; the others go back to sleep, wasting CPU cycles.

What does the herd consume?

Excessive context switches and scheduling in the Linux kernel, leading to high CPU overhead.

Locking overhead to ensure only one process/thread accesses the resource.

Linux solution – accept()

Before Linux 2.6, all processes waiting on the same socket were placed on the same wait queue; a new connection woke up every waiting process. Since Linux 2.6 a flag WQ_FLAG_EXCLUSIVE is set, so only one process is awakened.

Key kernel code (simplified):

struct sock *
inet_csk_accept(struct sock *sk, int flags, int *err)
{
    ...
    error = inet_csk_wait_for_connect(sk, timeo);
    ...
}

static int
inet_csk_wait_for_connect(struct sock *sk, long timeo)
{
    ...
    for (;;) {
        // only one process will be woken
        // non‑exclusive elements are added to the head of the wait queue,
        // exclusive elements are added after all non‑exclusive ones
        prepare_to_wait_exclusive(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
    }
    ...
}

void
prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
{
    unsigned long flags;
    // set the exclusive flag – only one process will be woken
    wait->flags |= WQ_FLAG_EXCLUSIVE;
    spin_lock_irqsave(&q->lock, flags);
    if (list_empty(&wait->task_list))
        __add_wait_queue_tail(q, wait);
    set_current_state(state);
    spin_unlock_irqrestore(&q->lock, flags);
}

Linux solution – epoll

When using I/O multiplexing (select, poll, epoll, kqueue), the herd effect appears in two cases:

epoll created before fork – all processes share the same epoll red‑black tree, and the kernel can avoid waking all processes.

epoll created after fork – each process has its own epoll structure, so the herd re‑appears.

Because the kernel cannot know which process should handle a readable event, it may wake the wrong process. The usual remedy is to let each worker maintain its own epoll instance for read/write events.

Nginx solution – lock design

Nginx implements its own inter‑process lock. When atomic operations are available, the lock is stored in shared memory; otherwise a file lock is used.

typedef struct {
#if (NGX_HAVE_ATOMIC_OPS)
    ngx_atomic_t *lock;
#else
    ngx_fd_t fd;
    u_char *name;
#endif
} ngx_shmtx_t;

Creating an atomic lock (when supported):

ngx_int_t
ngx_shmtx_create(ngx_shmtx_t *mtx, void *addr, u_char *name)
{
    mtx->lock = addr;
    return NGX_OK;
}

Atomic compare‑and‑set macro (using OSAtomic on supported platforms):

#define ngx_atomic_cmp_set(lock, old, new) \
    OSAtomicCompareAndSwap32Barrier(old, new, (int32_t *) lock)

When the OS does not provide an atomic primitive, Nginx falls back to inline assembly:

static ngx_inline ngx_atomic_uint_t
ngx_atomic_cmp_set(ngx_atomic_t *lock, ngx_atomic_uint_t old, ngx_atomic_uint_t set)
{
    u_char res;
    __asm__ volatile (
        "cmpxchgl %3, %1; "
        "sete %0; "
        : "=a" (res)
        : "m" (*lock), "a" (old), "r" (set)
        : "cc", "memory");
    return res;
}

Unlocking is simply:

#define ngx_shmtx_unlock(mtx) \
    (void) ngx_atomic_cmp_set((mtx)->lock, ngx_pid, 0)

Nginx accept‑mutex variables

if (ccf->master && ccf->worker_processes > 1 && ecf->accept_mutex) {
    ngx_use_accept_mutex = 1;
    ngx_accept_mutex_held = 0;
    ngx_accept_mutex_delay = ecf->accept_mutex_delay;
} else {
    ngx_use_accept_mutex = 0;
}

Key variables: ngx_use_accept_mutex – indicates whether Nginx should use an accept mutex. ngx_accept_mutex_held – whether the current process holds the lock. ngx_accept_mutex_delay – retry interval when lock acquisition fails.

When the lock is held, Nginx postpones processing of accept events (sets NGX_POST_EVENTS) until the lock is released, avoiding the herd.

if (ngx_use_accept_mutex) {
    if (ngx_accept_disabled > 0) {
        ngx_accept_disabled--;
    } else {
        if (ngx_trylock_accept_mutex(cycle) == NGX_ERROR) {
            return;
        }
        if (ngx_accept_mutex_held) {
            flags |= NGX_POST_EVENTS;
        } else {
            if (timer == NGX_TIMER_INFINITE || timer > ngx_accept_mutex_delay) {
                timer = ngx_accept_mutex_delay;
            }
        }
    }
}

The function that tries to acquire the accept mutex:

ngx_int_t
ngx_trylock_accept_mutex(ngx_cycle_t *cycle)
{
    // try to get the lock
    if (ngx_shmtx_trylock(&ngx_accept_mutex)) {
        if (ngx_accept_mutex_held && ngx_accept_events == 0 && !(ngx_event_flags & NGX_USE_RTSIG_EVENT)) {
            return NGX_OK;
        }
        // enable listening sockets
        if (ngx_enable_accept_events(cycle) == NGX_ERROR) {
            ngx_shmtx_unlock(&ngx_accept_mutex);
            return NGX_ERROR;
        }
        ngx_accept_events = 0;
        ngx_accept_mutex_held = 1;
        return NGX_OK;
    }
    // lock not obtained – disable listening sockets
    if (ngx_accept_mutex_held) {
        if (ngx_disable_accept_events(cycle) == NGX_ERROR) {
            return NGX_ERROR;
        }
        ngx_accept_mutex_held = 0;
    }
    return NGX_OK;
}

By acquiring the mutex before calling accept(), only one worker wakes up to handle the new connection; the others remove the socket from their epoll list, preventing unnecessary wake‑ups and achieving better load balancing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

concurrency Locking NGINX Linux kernel epoll thundering herd

Written by

Didi Tech

Official Didi technology account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.