Backend Development 9 min read

Understanding the Thundering Herd Problem and Its Solutions in Nginx

This article explains the thundering herd phenomenon in multi‑process servers, describes Nginx's master‑worker architecture and its use of epoll, and evaluates three mitigation techniques—accept_mutex, EPOLLEXCLUSIVE, and SO_REUSEPORT—along with relevant source code examples.

Selected Java Interview Questions

Sep 1, 2024

Understanding the Thundering Herd Problem and Its Solutions in Nginx

The "thundering herd" effect occurs when many processes or threads are simultaneously awakened by a single event, but only one can actually handle the event while the others go back to sleep, leading to wasted CPU cycles.

In simple terms, it is like a thunderclap that wakes many people, yet only one goes to fetch the clothes; similarly, a request wakes many workers, but only one processes it.

Root cause and problem : To increase request‑handling capacity, services often spawn multiple processes or threads that all listen for the same event. When the event arrives, all of them are awakened, causing unnecessary context switches and CPU waste.

Nginx Architecture

Nginx follows a master‑worker model: the master process handles configuration, signal processing, and listening socket creation, while worker processes handle the actual client requests.

Requests bypass the master and are processed directly by workers, raising the question of which worker should accept a new connection.

Nginx Uses epoll

Each worker creates its own epoll instance to monitor the shared listening socket. The epoll object is created in the worker process, not the master.

Master’s work (code example)

ngx_open_listening_sockets(ngx_cycle_t *cycle)
{
    ...
    for (i = 0; i < cycle->listening.nelts; i++) {
        ...
        if (bind(s, ls[i].sockaddr, ls[i].socklen) == -1) {
            if (listen(s, ls[i].backlog) == -1) {
                ...
            }
        }
    }
}

The master binds the listening ports according to nginx.conf.

Worker’s work (code examples)

ngx_spawn_process(ngx_cycle_t *cycle, ngx_spawn_proc_pt proc, void *data,
    char *name, ngx_int_t respawn)
{
    ...
    pid = fork();
    ...
}

Fork creates a copy of the task_struct, giving each worker its own process context.

ngx_epoll_init(ngx_cycle_t *cycle, ngx_msec_t timer)
{
    ngx_epoll_conf_t  *epcf;
    epcf = ngx_event_get_conf(cycle->conf_ctx, ngx_epoll_module);
    if (ep == -1) {
        ep = epoll_create(cycle->connection_n / 2);
    }
    ...
}

Each worker’s epoll instance monitors the same listening socket, but only one worker should accept a connection to avoid the thundering herd.

Solutions to the Thundering Herd

Three main approaches are used in Nginx: accept_mutex – an application‑level lock that ensures only one worker accepts a connection at a time. EPOLLEXCLUSIVE – a kernel‑level flag (available since Linux 4.5) that wakes only one waiting process for an event. SO_REUSEPORT – a kernel feature allowing multiple processes to bind the same port; the kernel load‑balances connections, waking a single worker per request.

accept_mutex

The mutex serializes accept calls; the worker that acquires the lock handles the request, while others go back to sleep. This method is simple and fair but can become a bottleneck under very high load.

EPOLLEXCLUSIVE

Introduced in Linux 4.5, this flag reduces the probability of waking all workers by ensuring that only one process blocked in epoll_wait is awakened for a given event. It does not guarantee that the awakened worker is idle, so occasional contention may still occur.

SO_REUSEPORT

Since Nginx 1.9.1, the reuseport directive enables each worker to have its own listening socket bound to the same port. The kernel performs a simple load‑balancing, delivering each new connection to a single worker.

http {
    server {
        listen 80 reuseport;
        server_name localhost;
        # ...
    }
}

While effective, this method can still suffer when a busy worker receives a connection that it cannot process promptly, potentially delaying other requests.

Conclusion

The article walks through the definition of the thundering herd problem, explains Nginx’s master‑worker architecture and its use of epoll, and evaluates three mitigation strategies—accept_mutex, EPOLLEXCLUSIVE, and SO_REUSEPORT—highlighting their trade‑offs and practical impact on high‑concurrency backend services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

epoll thundering herd SO_REUSEPORT accept_mutex

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.