How Nginx Solves the Thundering Herd Problem with epoll and Advanced Techniques
This article explains the thundering herd effect, walks through Nginx's master‑worker architecture and its use of epoll, and compares three practical solutions—accept_mutex, EPOLLEXCLUSIVE, and SO_REUSEPORT—to eliminate wasted wake‑ups in high‑concurrency servers.
What is the thundering herd effect?
The thundering herd (thundering herd) occurs when multiple processes or threads are blocked waiting for the same event; when the event occurs, all are awakened but only one can acquire the resource, forcing the others back to sleep and wasting CPU cycles.
In simple terms, it is like a thunderclap that wakes many people, yet only one goes to fetch the clothes.
Causes & Issues
When a server spawns many workers to listen for requests, a single incoming request can wake all workers, but only one can actually accept and handle it, leading to repeated wake‑sleep cycles and costly context switches.
Nginx Architecture
Nginx separates processes into a master and multiple workers (a classic master‑worker strategy). The master handles configuration, signal processing, and listening socket creation, while workers handle the actual request processing.
Requests bypass the master and are directly handled by workers, raising the question of which worker should accept a given request.
Nginx Uses epoll
Each worker creates its own epoll instance to monitor the shared listening socket, allowing efficient event‑driven I/O.
Master’s Work
ngx_open_listening_sockets(ngx_cycle_t *cycle){
...
for (i = 0; i < cycle->listening.nelts; i++) {
...
if (bind(s, ls[i].sockaddr, ls[i].socklen) == -1) {
if (listen(s, ls[i].backlog) == -1) {
...
}
}
}
}The master binds the configured ports and then forks worker processes, copying the task structure so workers inherit the listening sockets.
Worker’s Work
ngx_epoll_init(ngx_cycle_t *cycle, ngx_msec_t timer){
ngx_epoll_conf_t *epcf;
epcf = ngx_event_get_conf(cycle->conf_ctx, ngx_epoll_module);
if (ep == -1) {
ep = epoll_create(cycle->connection_n / 2);
}
...
}Each worker creates its own epoll object; the listening socket is shared among them.
Key Problem
When a request arrives, all workers could be awakened, but only one should actually accept it; otherwise, unnecessary wake‑ups degrade performance.
Solutions
accept_mutex (application‑level lock)
EPOLLEXCLUSIVE (kernel‑level flag)
SO_REUSEPORT (kernel‑level socket option)
accept_mutex
Workers compete for a mutex; the one that acquires the lock handles the request, while others go back to sleep. This method is simple and fair but can introduce latency.
EPOLLEXCLUSIVE
EPOLLEXCLUSIVE, added in Linux 4.5, reduces the thundering herd by waking only one of the processes waiting on a shared epoll file descriptor.
It lowers the probability of multiple workers being awakened, but does not eliminate it entirely because the socket remains shared.
SO_REUSEPORT
Since Nginx 1.9.1, the reuseport socket option allows multiple workers to bind the same port; the kernel performs load‑balancing and wakes only one worker per connection.
http {
server {
listen 80 reuseport;
server_name localhost;
# ...
}
}Benchmarks show significant performance gains, though the approach can still suffer when a busy worker receives a new connection while still processing a previous one.
Summary
The article introduces the thundering herd effect, explains Nginx’s master‑worker model and its epoll‑based event handling, and evaluates three mitigation strategies—accept_mutex, EPOLLEXCLUSIVE, and SO_REUSEPORT—highlighting their trade‑offs in real‑world high‑concurrency scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
