Why Nginx Doesn’t Fall Victim to the Thundering Herd Problem

This article explains the thundering herd phenomenon in multi‑process servers, walks through Nginx’s master‑worker architecture and its use of epoll, and compares three practical mitigation techniques—accept_mutex, EPOLLEXCLUSIVE, and SO_REUSEPORT—complete with code excerpts and configuration examples.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Why Nginx Doesn’t Fall Victim to the Thundering Herd Problem

What Is the Thundering Herd Effect?

The thundering herd (or thundering herd) describes a situation where many processes or threads are blocked waiting for the same event; when the event occurs, all of them are awakened, but only one can acquire the resource and handle the event while the others go back to sleep, causing wasted CPU cycles.

A simple analogy: a lightning strike wakes many people, yet only one goes to fetch the clothes.

Root Cause and Problem

High‑concurrency servers often spawn multiple processes or threads to listen for incoming requests. When a request arrives, every listener is awakened, but only one can actually accept and process it. The repeated wake‑sleep‑wake cycle leads to unnecessary context switches and performance loss.

Nginx Architecture Overview

Nginx follows a master‑worker model. The master process handles configuration loading, listening socket creation, and signal handling, while a pool of worker processes performs the actual request processing.

The master only binds to ports; workers accept connections directly.

How Nginx Uses epoll

Each worker creates its own epoll instance to monitor the shared listening socket. The relevant source files are ngx_epoll_module.c for the epoll implementation and ngx_event_accept.c for accept handling.

ngx_epoll_init(ngx_cycle_t *cycle, ngx_msec_t timer) {
    ngx_epoll_conf_t *epcf;
    epcf = ngx_event_get_conf(cycle->conf_ctx, ngx_epoll_module);
    if (ep == -1) {
        ep = epoll_create(cycle->connection_n / 2);
    }
}

Thus each worker has its own epoll object, but they all watch the same listening socket.

Key Issue

When a new connection arrives, which worker should handle it? Waking all workers would recreate the thundering herd problem.

Mitigation Strategies

accept_mutex (application‑level) The master protects the accept call with a mutex; the worker that acquires the lock performs accept() , while others skip it. This is simple but can become a bottleneck under very high load.

// Simplified snippet from src/event/ngx_event_accept.c
if (ngx_trylock_accept_mutex) {
    // accept connection
}

EPOLLEXCLUSIVE (kernel‑level, Linux 4.5+) Adding the EPOLLEXCLUSIVE flag to epoll_ctl ensures that only one waiting process is awakened for a given event, dramatically reducing the wake‑up storm.

EPOLLEXCLUSIVE was introduced in Linux 4.5 to lower the probability of thundering herd when multiple processes share the same file descriptor.

SO_REUSEPORT (kernel‑level) Since Nginx 1.9.1, the reuseport socket option can be enabled. The kernel load‑balances incoming connections across all listening sockets bound to the same port, guaranteeing that only one worker receives each connection.

http {
    server {
        listen 80 reuseport;
        server_name localhost;
        # ...
    }
}

Benchmarks show noticeable latency reduction, but the approach does not consider worker load; a busy worker may still receive a new connection, potentially increasing latency for that request.

Conclusion

The article walks through the definition of the thundering herd effect, explains Nginx’s master‑worker design and its epoll‑based event loop, and evaluates three mitigation techniques—accept_mutex, EPOLLEXCLUSIVE, and SO_REUSEPORT—highlighting their trade‑offs. While Nginx’s default model already handles most workloads efficiently, understanding these mechanisms helps engineers fine‑tune high‑traffic deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendconcurrencyNGINXepollthundering herdSO_REUSEPORTaccept_mutex
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.