How Nginx Tackles the Thundering Herd Problem with epoll and Advanced Locks

This article explains the thundering herd phenomenon in multi‑process servers, examines Nginx’s master‑worker architecture, and details three mitigation techniques—accept_mutex, EPOLLEXCLUSIVE, and SO_REUSEPORT—showing how each leverages epoll and kernel features to reduce unnecessary wake‑ups.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How Nginx Tackles the Thundering Herd Problem with epoll and Advanced Locks

The thundering herd effect occurs when many processes or threads block on the same event; when the event fires, all are awakened but only one can handle it, causing wasted CPU cycles and context switches.

In simple terms, it’s like a thunderstorm waking up many people, yet only one goes out to collect the laundry.

Root Cause

Servers often spawn multiple workers to increase concurrency. When a new connection arrives, the kernel wakes every worker that is waiting on the same listening socket, but only one can accept the connection, leading to repeated wake‑sleep cycles.

Nginx Architecture

Nginx uses a master‑worker model: the master process handles configuration, signal processing, and opens listening sockets, while workers accept and process client requests.

Nginx master‑worker diagram
Nginx master‑worker diagram

How Nginx Uses epoll

Each worker creates its own epoll instance to monitor the shared listening socket. The master opens the socket (see ngx_open_listening_sockets) and then forks workers, which inherit the socket descriptor.

ngx_open_listening_sockets(ngx_cycle_t *cycle) {
    ...
    for (i = 0; i < cycle->listening.nelts; i++) {
        if (bind(s, ls[i].sockaddr, ls[i].socklen) == -1) {
            if (listen(s, ls[i].backlog) == -1) {
                ...
            }
        }
    }
}

Workers also create their epoll objects (see ngx_epoll_init) and register the listening socket with epoll_ctl. Because each worker has its own epoll set, the kernel must decide which worker to wake when a new connection arrives.

Mitigation Strategies

Nginx provides three main ways to avoid the thundering herd:

accept_mutex – an application‑level lock that ensures only one worker calls accept() at a time.

EPOLLEXCLUSIVE – a kernel flag (available since Linux 4.5) that wakes only a single waiting process for a shared epoll file descriptor.

SO_REUSEPORT – allows multiple workers to bind the same port; the kernel load‑balances incoming connections so that only one worker receives each new socket.

accept_mutex

The mutex is a simple lock around the accept() call. The worker that acquires the lock processes the connection; others go back to sleep. This method is fair but adds lock contention overhead.

EPOLLEXCLUSIVE

EPOLLEXCLUSIVE is a flag added in Linux 4.5+ that reduces the probability of waking all waiting processes; only one is awakened per event.

It does not guarantee that the awakened worker is idle, so occasional wake‑ups of busy workers can still happen.

SO_REUSEPORT

Since Nginx 1.9.1, each worker creates its own listening socket with the reuseport option. The kernel performs load‑balancing at the socket layer, ensuring that only one worker receives each new connection.

http {
    server {
        listen 80 reuseport;
        server_name localhost;
        # ...
    }
}

Benchmarks show significant performance gains, but the approach cannot detect whether a worker is already busy, so a busy worker may still be handed new connections, potentially causing latency spikes.

Conclusion

The article walks through the definition of the thundering herd problem, explains Nginx’s master‑worker design, and details three practical solutions—accept_mutex, EPOLLEXCLUSIVE, and SO_REUSEPORT—each with its own trade‑offs. In most production scenarios, these mechanisms together keep Nginx’s event handling efficient.

References

In‑depth analysis of Nginx lock implementation (GitHub source links in text)

https://aosabook.org/en/v2/nginx.html

General epoll and thundering herd articles

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

epollSO_REUSEPORTaccept_mutex
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.