Fundamentals 30 min read

I/O Multiplexing in Linux: Detailed Explanation of select, poll, and epoll

Linux treats all I/O devices as files, enabling a single thread to monitor many descriptors via I/O multiplexing; while select and poll use linear scans and suffer size limits, epoll employs an event‑driven red‑black tree with edge‑triggered mode, offering scalable, high‑performance handling for thousands of concurrent connections.

Tencent Technical Engineering
Tencent Technical Engineering
Tencent Technical Engineering
I/O Multiplexing in Linux: Detailed Explanation of select, poll, and epoll

In Linux, all I/O devices are abstracted as files (Everything is a File). This design allows unified handling of I/O through file descriptors. Before diving into the implementations of I/O multiplexing mechanisms such as select , poll , and epoll , the article briefly reviews two concepts: the file abstraction and the idea of I/O multiplexing.

What is I/O Multiplexing?

Multiplex : multiple network connections (sockets).

Reuse : a single thread monitors the readiness of many file descriptors, enabling efficient handling of many I/O events without creating a thread per connection.

Three main multiplexing techniques: select , poll , and epoll . epoll is the newest and most performant.

The Five I/O Models

[1] blockingIO – 阻塞IO
[2] nonblockingIO – 非阻塞IO
[3] signaldrivenIO – 信号驱动IO
[4] asynchronousIO – 异步IO
[5] IOmultiplexing – IO多路复用

Blocking I/O Model

Both the data‑waiting stage and the data‑copy stage block the thread. The thread stays idle while the kernel prepares data and copies it to user space, leading to high CPU idle time and large system overhead when many connections are present.

Characteristics: blocking suspension, timely response, simple implementation, suitable for low‑concurrency scenarios, but high system overhead and poor scalability.

Non‑Blocking I/O Model

The call returns immediately. If no data is available, the kernel returns EWOULDBLOCK or EAGAIN . If data is ready, it is copied to user space.

Characteristics: no thread suspension, polling consumes CPU, slightly higher implementation difficulty, poorer real‑time performance, suitable for low‑concurrency applications.

Signal‑Driven I/O

The process registers a signal handler (e.g., SIGIO ) and returns immediately. When data becomes ready, the kernel sends a signal, and the handler performs the I/O.

Characteristics: non‑blocking, callback‑style notification, high implementation difficulty, complex signal handling, limited to low‑concurrency, high‑real‑time‑requirement scenarios.

Asynchronous I/O

The process initiates an I/O operation and returns immediately. The kernel completes the operation and notifies the process (Proactor pattern). It requires kernel support (available since Linux 2.5).

Characteristics: fully non‑blocking, high performance, complex implementation, ideal for high‑concurrency, high‑performance network services.

I/O Multiplexing Models

Linux’s default I/O is cached I/O, which copies data between kernel and user space multiple times, causing significant CPU and memory overhead. I/O can be divided into two stages: data preparation and kernel‑to‑user copy.

I/O Multiplexing: select, poll, epoll

All three system calls allow a single process to monitor many descriptors. They are synchronous I/O because the actual read/write still blocks after the event is reported.

select

int select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

Key parameters:

readfds : set of descriptors to test for readability.

writefds : set of descriptors to test for writability.

exceptfds : set of descriptors to test for exceptional conditions.

nfds : highest descriptor value + 1.

timeout : NULL (wait forever), zero (poll), or a specific interval.

Return values: >0 (ready descriptors), 0 (timeout), -1 (error).

Helper macros:

// Remove fd from set
void FD_CLR(int fd, fd_set *set);
// Test if fd is in set
int FD_ISSET(int fd, fd_set *set);
// Add fd to set
void FD_SET(int fd, fd_set *set);
// Clear all fds
void FD_ZERO(fd_set *set);

poll

struct pollfd {
    int   fd;      // file descriptor
    short events; // events to monitor
    short revents; // events returned
};
int poll(struct pollfd *fds, unsigned long nfds, int timeout);

Advantages over select :

Uses pollfd array, removing the 1024‑fd limit.

Still copies the entire array between user and kernel space, so performance degrades linearly with the number of descriptors.

epoll

epoll was introduced to overcome the scalability limits of select and poll . It uses an event‑driven design with a red‑black tree and a ready‑list, so the kernel only tracks active descriptors.

epoll_create : creates an epoll instance. int epoll_create(int size); // size is ignored on modern kernels, must be >0

epoll_ctl : registers, modifies, or deletes a descriptor. int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); // EPOLL_CTL_ADD, EPOLL_CTL_MOD, EPOLL_CTL_DEL

epoll_wait : waits for events. int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

Typical server example (simplified):

#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
const int MAX_EVENT_NUMBER = 10000;

int setnonblocking(int fd) {
    int old_option = fcntl(fd, F_GETFL);
    int new_option = old_option | O_NONBLOCK;
    fcntl(fd, F_SETFL, new_option);
    return old_option;
}

int main() {
    int listenfd = socket(PF_INET, SOCK_STREAM, 0);
    // bind, listen ...
    int epfd = epoll_create(5);
    struct epoll_event event;
    event.data.fd = listenfd;
    event.events = EPOLLIN | EPOLLET | EPOLLRDHUP;
    epoll_ctl(epfd, EPOLL_CTL_ADD, listenfd, &event);
    while (1) {
        int n = epoll_wait(epfd, events, MAX_EVENT_NUMBER, -1);
        for (int i = 0; i < n; ++i) {
            int sockfd = events[i].data.fd;
            if (sockfd == listenfd) {
                // accept new connection, add to epoll
            } else if (events[i].events & EPOLLIN) {
                // read data, then modify to EPOLLOUT
            } else if (events[i].events & EPOLLOUT) {
                // write response, then modify back to EPOLLIN
            }
        }
    }
    return 0;
}

Edge‑Triggered vs. Level‑Triggered

Level‑Triggered (LT) : as long as data is available (or buffer space exists), epoll_wait keeps reporting the descriptor as ready. This is the default mode and is simple to use.

Edge‑Triggered (ET) : the kernel reports an event only when the state changes (e.g., buffer transitions from empty to non‑empty). The application must read/write until EAGAIN to avoid missing further notifications. ET is more efficient for high‑performance servers because it reduces the number of wake‑ups.

Why epoll outperforms select/poll

Event‑driven: only active descriptors are placed in a ready queue, eliminating full scans.

No per‑call copying of the entire descriptor set between user and kernel space.

Uses memory‑mapped data structures (mmap) to avoid extra copies.

Supports virtually unlimited numbers of descriptors, unlike the 1024‑fd limit of select .

Red‑black tree gives O(log N) insertion/deletion, while select / poll have O(N) traversal.

Conclusion

All three mechanisms— select , poll , and epoll —provide I/O multiplexing, but they differ in scalability and performance. select and poll suffer from linear scanning and copying overhead, while epoll offers edge‑triggered notifications, efficient data structures, and no hard limit on the number of monitored descriptors, making it the preferred choice for high‑concurrency, high‑performance network services.

Linuxsystem programmingI/O multiplexingepollpollSELECT *
Tencent Technical Engineering
Written by

Tencent Technical Engineering

Official account of Tencent Technology. A platform for publishing and analyzing Tencent's technological innovations and cutting-edge developments.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.