Mastering epoll: Boost Linux Server Performance with Edge-Triggered I/O
This article explains the epoll interface, its underlying data structures, the three-step usage pattern, the differences between level‑triggered and edge‑triggered modes, the reactor model, and provides a complete C demo, helping developers efficiently handle millions of concurrent TCP connections on Linux.
Introduction
epoll is a Linux kernel interface designed to efficiently handle a massive number of file descriptors, improving CPU utilization for high‑concurrency server programs where only a small fraction of connections are active at any moment.
Main Content
Consider a scenario with one million simultaneous TCP connections, but only dozens or hundreds are active at a time. Traditional select/poll would require passing all one million sockets to the kernel on each poll, causing huge memory copies and wasted CPU cycles, limiting scalability to a few thousand connections. epoll avoids this by creating a lightweight file‑system object in the kernel and splitting the operation into three parts:
int epoll_create(int size);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);Call epoll_create to allocate an epoll object.
Use epoll_ctl to add or remove sockets from the epoll object.
Call epoll_wait to retrieve only the sockets with pending events.
Only one epoll object is created at program start; sockets are added or removed as needed, so epoll_wait can return ready events without scanning all connections.
1. Detailed epoll principle
When epoll_create is invoked, the kernel creates an eventpoll structure containing a red‑black tree (storing all registered events) and a doubly‑linked list ( rdllist) that holds ready events. epoll_wait simply checks whether rdllist is non‑empty; if it is, the events are copied to user space via shared memory.
Each registered event is represented by an epitem structure, which links the socket to the red‑black tree and the ready list.
When an event occurs, the kernel’s ep_poll_callback places the corresponding epitem into rdllist. epoll_wait then returns these events with minimal overhead, allowing the handling of millions of concurrent connections.
2. Two epoll trigger modes
epoll supports Level‑Triggered (LT) and Edge‑Triggered (ET) modes. LT is the default; EPOLLLT returns an event as long as data remains readable. ET (EPOLLET) notifies only when a new event arrives; the application must read until EAGAIN to avoid missing data.
LT: triggers on every readable condition; may return repeatedly for the same data.
ET: triggers only on state changes; requires non‑blocking I/O and complete reads/writes.
ET reduces the number of wake‑ups but can be harder to program correctly.
3. epoll reactor model
The classic epoll flow is:
epoll_create(); // create red‑black tree
epoll_ctl(); // add fd to tree
epoll_wait(); // wait for eventsThe reactor model expands this to handle accept, read, and write callbacks in a loop, moving file descriptors between read and write monitoring as needed.
Demo program
#include <stdio.h>
#include <sys/socket.h>
#include <sys/epoll.h>
#include <arpa/inet.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#define MAX_EVENTS 1024
#define BUFLEN 4096
#define SERV_PORT 6666
/* ... (full source code as in the original article) ... */Final recommendations
Deeply understand the read/write differences between LT and ET, handle errors gracefully, consider multithreaded load balancing, and use flags like EPOLLONESHOT and EPOLLEXCLUSIVE to avoid the thundering‑herd problem.
Be aware of epoll’s limitations: timer granularity (~5 ms), potential overhead when connections are few but highly active, inability to batch epoll_ctl calls, and occasional premature wake‑ups.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
