Understanding and Simulating the Thundering Herd Problem in Linux Network Servers
This article explains the thundering herd phenomenon in Linux network programming, demonstrates how to reproduce it with multi‑process accept and epoll examples, analyzes why modern kernels often suppress it, and discusses mitigation techniques such as mutex locking in Nginx.
1. Introduction
The author, with nearly four years of Linux network development experience, explores the thundering herd problem, a performance issue that occurs when multiple processes or threads are awakened simultaneously for a single event, leading to wasted CPU cycles.
2. What is the thundering herd?
When several processes block on the same socket (e.g., after listen and accept), the kernel may wake all of them when a connection arrives; only one can handle the connection while the others return to sleep, causing unnecessary wake‑ups.
3. Simulating the problem with forked processes
A simple multi‑process server is created: the parent binds a listening socket, then forks four workers that each call accept in a loop. The source code is shown below.
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <assert.h>
#include <sys/wait.h>
#include <string.h>
#include <errno.h>
#define IP "127.0.0.1"
#define PORT 8888
#define WORKER 4
int worker(int listenfd, int i)
{
while (1) {
printf("I am worker %d, begin to accept connection.
", i);
struct sockaddr_in client_addr;
socklen_t client_addrlen = sizeof(client_addr);
int connfd = accept(listenfd, (struct sockaddr *)&client_addr, &client_addrlen);
if (connfd != -1) {
printf("worker %d accept a connection success.\t", i);
printf("ip :%s\t", inet_ntoa(client_addr.sin_addr));
printf("port: %d
", client_addr.sin_port);
} else {
printf("worker %d accept a connection failed,error:%s", i, strerror(errno));
close(connfd);
}
}
return 0;
}
int main()
{
int i = 0;
struct sockaddr_in address;
bzero(&address, sizeof(address));
address.sin_family = AF_INET;
inet_pton(AF_INET, IP, &address.sin_addr);
address.sin_port = htons(PORT);
int listenfd = socket(PF_INET, SOCK_STREAM, 0);
assert(listenfd >= 0);
int ret = bind(listenfd, (struct sockaddr *)&address, sizeof(address));
assert(ret != -1);
ret = listen(listenfd, 5);
assert(ret != -1);
for (i = 0; i < WORKER; i++) {
printf("Create worker %d
", i+1);
pid_t pid = fork();
if (pid == 0) {
worker(listenfd, i);
}
if (pid < 0) {
printf("fork error");
}
}
int status;
wait(&status);
return 0;
}Running the program and connecting with telnet 127.0.0.1 8888 shows that only one worker (e.g., worker2) accepts the connection while the others remain idle, indicating that the thundering herd did not occur in this test.
4. Epoll‑based implementation
Because many production servers use select, poll or epoll instead of blocking accept, the thundering herd can still appear. An epoll version is provided below, where each worker calls epoll_wait and then accept.
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/epoll.h>
#include <netdb.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/wait.h>
#define IP "127.0.0.1"
#define PORT 8888
#define PROCESS_NUM 4
#define MAXEVENTS 64
static int create_and_bind() { /* ... */ }
static int make_socket_non_blocking(int sfd) { /* ... */ }
void worker(int sfd, int efd, struct epoll_event *events, int k) {
while (1) {
int n, i;
n = epoll_wait(efd, events, MAXEVENTS, -1);
printf("worker %d return from epoll_wait!
", k);
for (i = 0; i < n; i++) {
if ((events[i].events & EPOLLERR) || (events[i].events & EPOLLHUP) || (!(events[i].events & EPOLLIN))) {
fprintf(stderr, "epoll error
");
close(events[i].data.fd);
continue;
} else if (sfd == events[i].data.fd) {
struct sockaddr in_addr;
socklen_t in_len = sizeof in_addr;
int infd = accept(sfd, &in_addr, &in_len);
if (infd == -1) {
printf("worker %d accept failed!
", k);
break;
}
printf("worker %d accept successed!
", k);
close(infd);
}
}
}
}
int main(int argc, char *argv[]) {
int sfd = create_and_bind();
make_socket_non_blocking(sfd);
listen(sfd, SOMAXCONN);
int efd = epoll_create(MAXEVENTS);
struct epoll_event event;
event.data.fd = sfd;
event.events = EPOLLIN;
epoll_ctl(efd, EPOLL_CTL_ADD, sfd, &event);
struct epoll_event *events = calloc(MAXEVENTS, sizeof event);
for (int k = 0; k < PROCESS_NUM; k++) {
printf("Create worker %d
", k+1);
pid_t pid = fork();
if (pid == 0) {
worker(sfd, efd, events, k);
}
}
int status;
wait(&status);
free(events);
close(sfd);
return EXIT_SUCCESS;
}Testing this version also shows only one process handling the connection, confirming that the kernel’s wake‑up optimization prevents the classic thundering herd in many cases.
5. Why the phenomenon may not appear on modern kernels
Since Linux 2.6, the kernel has changed the wake‑up behavior: when a connection arrives, only the first process or thread in the wait queue is awakened. Therefore, a purely blocking accept loop no longer suffers from the thundering herd, though epoll‑based servers can still exhibit it in certain scenarios.
6. Mitigation strategies
In Nginx, the problem is solved by using a global mutex: each worker tries to acquire the lock before calling epoll_wait. If the lock is unavailable, the worker sleeps, and a load‑balancing algorithm (e.g., avoiding workers that have processed 7/8 of the load) distributes work more evenly.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
