Backend Development 85 min read

Deep Dive into Linux epoll: Design, Implementation, and Performance

epoll is a high‑performance Linux I/O multiplexing mechanism that replaces select/poll by using an event‑driven design with a red‑black tree and ready list, supporting edge‑ and level‑triggered modes, efficient data transfer via mmap, and providing superior scalability for high‑concurrency network applications.

Deepin Linux
Deepin Linux
Deepin Linux
Deep Dive into Linux epoll: Design, Implementation, and Performance

epoll is a Linux kernel I/O multiplexing facility that achieves high performance by abandoning the linear scanning of select/poll and adopting an event‑driven architecture. It maintains registered file descriptors in a red‑black tree (O(log N) operations) and places ready descriptors into a double‑linked ready list, allowing epoll_wait to traverse only active sockets.

The core data structures include struct eventpoll (the epoll instance), struct epitem (one per monitored fd), and auxiliary structures for wait‑queue handling. When a file descriptor becomes ready, the kernel’s poll implementation registers a callback via ep_ptable_queue_proc , which creates a wait_queue_t entry with ep_poll_callback as the wake‑up function. This callback inserts the corresponding epitem into the ready list and wakes any thread blocked in epoll_wait .

Three user‑space APIs are provided:

epoll_create/epoll_create1 – allocates an eventpoll object and returns a file descriptor.

epoll_ctl – adds ( EPOLL_CTL_ADD ), modifies ( EPOLL_CTL_MOD ), or deletes ( EPOLL_CTL_DEL ) monitored fds, inserting or removing epitem nodes in the red‑black tree and registering the poll callbacks.

epoll_wait – blocks until at least one monitored fd is ready, then copies the events to user‑space via __put_user and optionally re‑queues level‑triggered items.

epoll supports two trigger modes: level‑triggered (LT) where ready fds stay in the ready list until the application consumes them, and edge‑triggered (ET) where an fd is reported only on state changes, reducing spurious wake‑ups. The EPOLLONESHOT flag further limits notifications to a single event.

Compared with select/poll, epoll offers O(1) event retrieval, no hard limit on the number of fds, and avoids copying the entire fd set on each call. Benchmarks show epoll handling 10 000 concurrent connections in roughly 1.2 seconds (≈8 333 req/s), while select and poll require 5.6 s and 4.8 s respectively.

Typical usage in high‑concurrency servers (e.g., Nginx, Redis) follows the pattern: create an epoll instance, add the listening socket, loop on epoll_wait , accept new connections, add them to the epoll set, and process readable/writable events. A minimal example in C demonstrates this flow, with the key API calls highlighted in code blocks.

#include <sys/epoll.h>
int epfd = epoll_create1(0);
struct epoll_event ev = {.events = EPOLLIN, .data.fd = listenfd};
epoll_ctl(epfd, EPOLL_CTL_ADD, listenfd, &ev);
int n = epoll_wait(epfd, events, MAX_EVENTS, -1);
Linuxhigh concurrencyI/O multiplexingNetwork Programmingevent-drivenepoll
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.