Why epoll Beats select: A Deep Dive into Linux I/O Multiplexing
This article explains the advantages of epoll over select for I/O multiplexing in Linux, covering its event-driven design, edge-triggered vs level-triggered modes, core APIs, practical code examples, and performance considerations for high‑concurrency network servers.
epoll and select
Compared with select, the biggest advantage of epoll is that its efficiency does not decrease as the number of monitored file descriptors grows, because select uses polling in the kernel, which becomes slower with more fds.
In linux/posix_types.h there is a declaration: #define __FD_SETSIZE 1024, meaning select can monitor at most 1024 fds simultaneously.
IO multiplexing with select
IO multiplexing allows a process to monitor multiple sockets without consuming excessive resources. select blocks until one or more sockets have data ready, but it has a limit on the number of fds and its overhead grows linearly with the number of fds.
epoll
epoll's ET mode works only with non‑blocking sockets, while LT works with both blocking and non‑blocking sockets. All I/O multiplexing operations are synchronous (select/poll/epoll).
epoll events include EPOLLIN, EPOLLOUT, EPOLLPRI, EPOLLERR, EPOLLHUP, EPOLLET, EPOLLONESHOT, etc.
epoll事件:
EPOLLIN:可读(包括对端正常关闭)
EPOLLOUT:可写
EPOLLPRI:紧急数据可读(带外数据)
EPOLLERR:错误
EPOLLHUP:挂断epoll API
The epoll interface consists of three functions:
int epoll_create(int size); Creates an epoll instance. The size argument is ignored in modern kernels.
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); Adds, modifies, or deletes a file descriptor from the epoll set.
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout); Waits for events. Returns the number of ready events.
Typical usage steps:
Call epoll_create to obtain an epoll handle.
Register the listening socket with epoll_ctl(..., EPOLL_CTL_ADD, listenfd, &ev) where ev.events = EPOLLIN | EPOLLET.
In the event loop, call epoll_wait to obtain ready events, then handle accept, read, or write accordingly.
Example server code (simplified)
#include <iostream>
#include <sys/socket.h>
#include <sys/epoll.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#define MAXLINE 255
#define LISTENQ 20
void setnonblocking(int sock) {
int opts = fcntl(sock, F_GETFL);
if (opts < 0) { perror("fcntl"); exit(1); }
opts |= O_NONBLOCK;
if (fcntl(sock, F_SETFL, opts) < 0) { perror("fcntl"); exit(1); }
}
int main(int argc, char *argv[]) {
if (argc != 2) { fprintf(stderr, "Usage: %s port
", argv[0]); return 1; }
int port = atoi(argv[1]);
int listenfd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in serveraddr;
memset(&serveraddr, 0, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);
serveraddr.sin_port = htons(port);
bind(listenfd, (struct sockaddr *)&serveraddr, sizeof(serveraddr));
listen(listenfd, LISTENQ);
setnonblocking(listenfd);
int epfd = epoll_create(256);
struct epoll_event ev, events[64];
ev.data.fd = listenfd;
ev.events = EPOLLIN | EPOLLET;
epoll_ctl(epfd, EPOLL_CTL_ADD, listenfd, &ev);
while (true) {
int nfds = epoll_wait(epfd, events, 64, -1);
for (int i = 0; i < nfds; ++i) {
if (events[i].data.fd == listenfd) {
// accept all pending connections
while (true) {
struct sockaddr_in clientaddr;
socklen_t addrlen = sizeof(clientaddr);
int connfd = accept(listenfd, (struct sockaddr *)&clientaddr, &addrlen);
if (connfd < 0) {
if (errno == EAGAIN || errno == EWOULDBLOCK) break;
perror("accept"); break;
}
setnonblocking(connfd);
ev.data.fd = connfd;
ev.events = EPOLLIN | EPOLLET;
epoll_ctl(epfd, EPOLL_CTL_ADD, connfd, &ev);
}
} else if (events[i].events & EPOLLIN) {
// read data until EAGAIN
char buf[MAXLINE];
while (true) {
ssize_t n = read(events[i].data.fd, buf, sizeof(buf));
if (n < 0) {
if (errno == EAGAIN || errno == EWOULDBLOCK) break;
close(events[i].data.fd);
break;
} else if (n == 0) {
close(events[i].data.fd);
break;
} else {
// process data (echo back)
write(events[i].data.fd, buf, n);
}
}
}
}
}
return 0;
}Edge‑triggered (ET) vs Level‑triggered (LT)
LT (default) notifies whenever a fd is readable/writable; if not all data is processed, the fd will be reported again. ET notifies only when the state changes from not ready to ready; the application must read/write until EAGAIN to avoid missing events.
ET provides higher performance when many idle fds exist because it avoids repeated notifications for fds that have already been processed.
Reading and writing with non‑blocking sockets
When read or write returns -1 with errno set to EAGAIN or EWOULDBLOCK, the operation would block; the correct handling is to retry later (after epoll signals readiness). For blocking sockets, these errors indicate a timeout.
Accept handling
Listening sockets should be set to non‑blocking mode. In ET mode, accept may need to be called in a loop until it returns -1 with EAGAIN, otherwise pending connections could be left unaccepted.
Scalability
epoll can handle millions of connections because it stores fds in a red‑black tree (logarithmic lookup) and uses a ready list for events, avoiding O(n) scans. Memory consumption per idle TCP connection is about 3.3 KB; with 4 GB RAM a server can hold roughly 1 million idle connections.
Common pitfalls and tuning
Too many TIME_WAIT sockets on the client side can exhaust local ports; adjust net.ipv4.tcp_fin_timeout, enable tcp_tw_reuse, tcp_tw_recycle, and reduce tcp_max_syn_backlog.
Increase net.ipv4.ip_local_port_range for clients needing many outbound connections.
Increase fs.file‑max and ulimit -n for server‑side file descriptor limits.
Enable net.ipv4.tcp_syncookies to mitigate SYN‑flood attacks.
Comparison of select, poll, epoll
select and poll have O(n) complexity and copy fd sets between user and kernel space on each call. select is limited to FD_SETSIZE (typically 1024). poll removes the hard limit but still incurs linear scanning and copying overhead. epoll provides O(1) event notification, no hard fd limit, and uses shared memory (mmap) to avoid copying.
In summary, for high‑concurrency network servers on Linux, epoll (especially in ET mode) offers superior scalability and performance compared to select and poll.
References
Linux man pages for epoll_create, epoll_ctl, epoll_wait Various kernel source comments on I/O multiplexing
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
