Why Nginx Doesn’t Fall Victim to the Thundering Herd Problem
This article explains the thundering herd phenomenon in multi‑process servers, walks through Nginx’s master‑worker architecture and its use of epoll, and compares three practical mitigation techniques—accept_mutex, EPOLLEXCLUSIVE, and SO_REUSEPORT—complete with code excerpts and configuration examples.
What Is the Thundering Herd Effect?
The thundering herd (or thundering herd) describes a situation where many processes or threads are blocked waiting for the same event; when the event occurs, all of them are awakened, but only one can acquire the resource and handle the event while the others go back to sleep, causing wasted CPU cycles.
A simple analogy: a lightning strike wakes many people, yet only one goes to fetch the clothes.
Root Cause and Problem
High‑concurrency servers often spawn multiple processes or threads to listen for incoming requests. When a request arrives, every listener is awakened, but only one can actually accept and process it. The repeated wake‑sleep‑wake cycle leads to unnecessary context switches and performance loss.
Nginx Architecture Overview
Nginx follows a master‑worker model. The master process handles configuration loading, listening socket creation, and signal handling, while a pool of worker processes performs the actual request processing.
The master only binds to ports; workers accept connections directly.
How Nginx Uses epoll
Each worker creates its own epoll instance to monitor the shared listening socket. The relevant source files are ngx_epoll_module.c for the epoll implementation and ngx_event_accept.c for accept handling.
ngx_epoll_init(ngx_cycle_t *cycle, ngx_msec_t timer) {
ngx_epoll_conf_t *epcf;
epcf = ngx_event_get_conf(cycle->conf_ctx, ngx_epoll_module);
if (ep == -1) {
ep = epoll_create(cycle->connection_n / 2);
}
}Thus each worker has its own epoll object, but they all watch the same listening socket.
Key Issue
When a new connection arrives, which worker should handle it? Waking all workers would recreate the thundering herd problem.
Mitigation Strategies
accept_mutex (application‑level) The master protects the accept call with a mutex; the worker that acquires the lock performs accept() , while others skip it. This is simple but can become a bottleneck under very high load.
// Simplified snippet from src/event/ngx_event_accept.c
if (ngx_trylock_accept_mutex) {
// accept connection
}EPOLLEXCLUSIVE (kernel‑level, Linux 4.5+) Adding the EPOLLEXCLUSIVE flag to epoll_ctl ensures that only one waiting process is awakened for a given event, dramatically reducing the wake‑up storm.
EPOLLEXCLUSIVE was introduced in Linux 4.5 to lower the probability of thundering herd when multiple processes share the same file descriptor.
SO_REUSEPORT (kernel‑level) Since Nginx 1.9.1, the reuseport socket option can be enabled. The kernel load‑balances incoming connections across all listening sockets bound to the same port, guaranteeing that only one worker receives each connection.
http {
server {
listen 80 reuseport;
server_name localhost;
# ...
}
}Benchmarks show noticeable latency reduction, but the approach does not consider worker load; a busy worker may still receive a new connection, potentially increasing latency for that request.
Conclusion
The article walks through the definition of the thundering herd effect, explains Nginx’s master‑worker design and its epoll‑based event loop, and evaluates three mitigation techniques—accept_mutex, EPOLLEXCLUSIVE, and SO_REUSEPORT—highlighting their trade‑offs. While Nginx’s default model already handles most workloads efficiently, understanding these mechanisms helps engineers fine‑tune high‑traffic deployments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
