How Redis Implements Efficient I/O Multiplexing with select, epoll, and kqueue
This article explains Redis's clean and elegant I/O multiplexing implementation, covering blocking I/O limitations, the reactor pattern, abstraction of select/epoll/kqueue into a unified API, key source functions, and platform‑specific module selection to achieve high‑performance single‑threaded networking.
Introduction
Redis's source code is well‑structured for study, and its I/O multiplexing implementation is particularly clean and elegant. This article provides a concise overview of that implementation.
I/O Models
Redis runs in a single thread, processing commands sequentially. Blocking I/O (using read or write) would stall the entire server when a file descriptor is not ready, preventing service to other clients. Therefore, I/O multiplexing is required to monitor many descriptors simultaneously.
I/O Multiplexing
The core of the multiplexing model is the select system call, which can monitor multiple file descriptors for readability, writability, or errors. Other multiplexing interfaces such as epoll, kqueue, and evport offer better performance and scalability.
Reactor Design Pattern
Redis adopts the reactor pattern for its file‑event handler. Each network connection corresponds to a file descriptor (FD). The event handler uses the I/O multiplexing module to monitor multiple FDs and invokes callbacks when accept, read, write, or close events occur.
I/O Multiplexing Module
The module abstracts select, epoll, evport, and kqueue behind a uniform API, exposing functions such as:
static int aeApiCreate(aeEventLoop *eventLoop) static int aeApiResize(aeEventLoop *eventLoop, int setsize) static void aeApiFree(aeEventLoop *eventLoop) static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) static void aeApiDelEvent(aeEventLoop *eventLoop, int fd, int mask) static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp)Each sub‑module stores its own context in an aeApiState structure, which is attached to eventLoop->apidata and remains invisible to upper layers.
// select implementation
typedef struct aeApiState {
fd_set rfds, wfds;
fd_set _rfds, _wfds;
} aeApiState;
// epoll implementation
typedef struct aeApiState {
int epfd;
struct epoll_event *events;
} aeApiState;Wrapping the select Function
The typical usage of select involves initializing an fd_set, adding descriptors with FD_SET, calling select, and then checking which descriptors are ready.
int fd = /* file descriptor */;
fd_set rfds;
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
for (;;) {
select(fd+1, &rfds, NULL, NULL, NULL);
if (FD_ISSET(fd, &rfds)) {
/* fd becomes readable */
}
}Redis's wrapper follows the same flow. In aeApiCreate it allocates the state and clears the sets:
static int aeApiCreate(aeEventLoop *eventLoop) {
aeApiState *state = zmalloc(sizeof(aeApiState));
if (!state) return -1;
FD_ZERO(&state->rfds);
FD_ZERO(&state->wfds);
eventLoop->apidata = state;
return 0;
}Adding or removing events updates the appropriate fd_set with FD_SET or FD_CLR:
static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
aeApiState *state = eventLoop->apidata;
if (mask & AE_READABLE) FD_SET(fd, &state->rfds);
if (mask & AE_WRITABLE) FD_SET(fd, &state->wfds);
return 0;
}The poll function copies the sets, calls select, and translates the results into the fired array:
static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
aeApiState *state = eventLoop->apidata;
memcpy(&state->_rfds, &state->rfds, sizeof(fd_set));
memcpy(&state->_wfds, &state->wfds, sizeof(fd_set));
int retval = select(eventLoop->maxfd+1, &state->_rfds, &state->_wfds, NULL, tvp);
int numevents = 0;
if (retval > 0) {
for (int j = 0; j <= eventLoop->maxfd; j++) {
int mask = 0;
aeFileEvent *fe = &eventLoop->events[j];
if (fe->mask == AE_NONE) continue;
if (fe->mask & AE_READABLE && FD_ISSET(j, &state->_rfds)) mask |= AE_READABLE;
if (fe->mask & AE_WRITABLE && FD_ISSET(j, &state->_wfds)) mask |= AE_WRITABLE;
eventLoop->fired[numevents].fd = j;
eventLoop->fired[numevents].mask = mask;
numevents++;
}
}
return numevents;
}Wrapping the epoll Function
When epoll is available, Redis creates an epoll instance and stores the file descriptor in the state:
static int aeApiCreate(aeEventLoop *eventLoop) {
aeApiState *state = zmalloc(sizeof(aeApiState));
if (!state) return -1;
state->events = zmalloc(sizeof(struct epoll_event) * eventLoop->setsize);
if (!state->events) { zfree(state); return -1; }
state->epfd = epoll_create(1024);
if (state->epfd == -1) { zfree(state->events); zfree(state); return -1; }
eventLoop->apidata = state;
return 0;
}Adding an event uses epoll_ctl with either EPOLL_CTL_ADD or EPOLL_CTL_MOD depending on whether the fd was already monitored:
static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
aeApiState *state = eventLoop->apidata;
struct epoll_event ee = {0};
int op = eventLoop->events[fd].mask == AE_NONE ? EPOLL_CTL_ADD : EPOLL_CTL_MOD;
ee.events = 0;
mask |= eventLoop->events[fd].mask; // merge old events
if (mask & AE_READABLE) ee.events |= EPOLLIN;
if (mask & AE_WRITABLE) ee.events |= EPOLLOUT;
ee.data.fd = fd;
if (epoll_ctl(state->epfd, op, fd, &ee) == -1) return -1;
return 0;
}The poll function calls epoll_wait, then translates each returned epoll_event into the fired array:
static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
aeApiState *state = eventLoop->apidata;
int retval = epoll_wait(state->epfd, state->events, eventLoop->setsize,
tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);
int numevents = 0;
if (retval > 0) {
numevents = retval;
for (int j = 0; j < numevents; j++) {
int mask = 0;
struct epoll_event *e = state->events + j;
if (e->events & EPOLLIN) mask |= AE_READABLE;
if (e->events & EPOLLOUT) mask |= AE_WRITABLE;
if (e->events & EPOLLERR) mask |= AE_WRITABLE;
if (e->events & EPOLLHUP) mask |= AE_WRITABLE;
eventLoop->fired[j].fd = e->data.fd;
eventLoop->fired[j].mask = mask;
}
}
return numevents;
}Sub‑module Selection
Redis chooses the best available multiplexing implementation at compile time using macros:
#ifdef HAVE_EVPORT
#include "ae_evport.c"
#else
#ifdef HAVE_EPOLL
#include "ae_epoll.c"
#else
#ifdef HAVE_KQUEUE
#include "ae_kqueue.c"
#else
#include "ae_select.c"
#endif
#endif
#endifIf none of the high‑performance interfaces are present, select serves as the fallback, though it has higher time complexity and a limit of 1024 descriptors.
Conclusion
Redis's I/O multiplexing module is concise and portable. By abstracting platform‑specific mechanisms behind a uniform API and selecting the optimal implementation at compile time, Redis can serve tens of thousands of file descriptors in a single‑process architecture while keeping the codebase simple and reducing the risk of bugs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Tech Hub
Sharing cutting-edge internet technologies and practical AI resources.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
