How Redis Implements Efficient I/O Multiplexing with select, epoll, and kqueue

This article explains Redis's clean and elegant I/O multiplexing implementation, covering blocking I/O limitations, the reactor pattern, abstraction of select/epoll/kqueue into a unified API, key source functions, and platform‑specific module selection to achieve high‑performance single‑threaded networking.

Open Source Tech Hub
Open Source Tech Hub
Open Source Tech Hub
How Redis Implements Efficient I/O Multiplexing with select, epoll, and kqueue

Introduction

Redis's source code is well‑structured for study, and its I/O multiplexing implementation is particularly clean and elegant. This article provides a concise overview of that implementation.

I/O Models

Redis runs in a single thread, processing commands sequentially. Blocking I/O (using read or write) would stall the entire server when a file descriptor is not ready, preventing service to other clients. Therefore, I/O multiplexing is required to monitor many descriptors simultaneously.

I/O Multiplexing

The core of the multiplexing model is the select system call, which can monitor multiple file descriptors for readability, writability, or errors. Other multiplexing interfaces such as epoll, kqueue, and evport offer better performance and scalability.

I/O Multiplexing Modes
I/O Multiplexing Modes

Reactor Design Pattern

Redis adopts the reactor pattern for its file‑event handler. Each network connection corresponds to a file descriptor (FD). The event handler uses the I/O multiplexing module to monitor multiple FDs and invokes callbacks when accept, read, write, or close events occur.

Redis Reactor Pattern
Redis Reactor Pattern

I/O Multiplexing Module

The module abstracts select, epoll, evport, and kqueue behind a uniform API, exposing functions such as:

static int aeApiCreate(aeEventLoop *eventLoop)
static int aeApiResize(aeEventLoop *eventLoop, int setsize)
static void aeApiFree(aeEventLoop *eventLoop)
static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask)
static void aeApiDelEvent(aeEventLoop *eventLoop, int fd, int mask)
static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp)

Each sub‑module stores its own context in an aeApiState structure, which is attached to eventLoop->apidata and remains invisible to upper layers.

// select implementation
typedef struct aeApiState {
    fd_set rfds, wfds;
    fd_set _rfds, _wfds;
} aeApiState;

// epoll implementation
typedef struct aeApiState {
    int epfd;
    struct epoll_event *events;
} aeApiState;

Wrapping the select Function

The typical usage of select involves initializing an fd_set, adding descriptors with FD_SET, calling select, and then checking which descriptors are ready.

int fd = /* file descriptor */;
fd_set rfds;
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
for (;;) {
    select(fd+1, &rfds, NULL, NULL, NULL);
    if (FD_ISSET(fd, &rfds)) {
        /* fd becomes readable */
    }
}

Redis's wrapper follows the same flow. In aeApiCreate it allocates the state and clears the sets:

static int aeApiCreate(aeEventLoop *eventLoop) {
    aeApiState *state = zmalloc(sizeof(aeApiState));
    if (!state) return -1;
    FD_ZERO(&state->rfds);
    FD_ZERO(&state->wfds);
    eventLoop->apidata = state;
    return 0;
}

Adding or removing events updates the appropriate fd_set with FD_SET or FD_CLR:

static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
    aeApiState *state = eventLoop->apidata;
    if (mask & AE_READABLE) FD_SET(fd, &state->rfds);
    if (mask & AE_WRITABLE) FD_SET(fd, &state->wfds);
    return 0;
}

The poll function copies the sets, calls select, and translates the results into the fired array:

static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    aeApiState *state = eventLoop->apidata;
    memcpy(&state->_rfds, &state->rfds, sizeof(fd_set));
    memcpy(&state->_wfds, &state->wfds, sizeof(fd_set));
    int retval = select(eventLoop->maxfd+1, &state->_rfds, &state->_wfds, NULL, tvp);
    int numevents = 0;
    if (retval > 0) {
        for (int j = 0; j <= eventLoop->maxfd; j++) {
            int mask = 0;
            aeFileEvent *fe = &eventLoop->events[j];
            if (fe->mask == AE_NONE) continue;
            if (fe->mask & AE_READABLE && FD_ISSET(j, &state->_rfds)) mask |= AE_READABLE;
            if (fe->mask & AE_WRITABLE && FD_ISSET(j, &state->_wfds)) mask |= AE_WRITABLE;
            eventLoop->fired[numevents].fd = j;
            eventLoop->fired[numevents].mask = mask;
            numevents++;
        }
    }
    return numevents;
}

Wrapping the epoll Function

When epoll is available, Redis creates an epoll instance and stores the file descriptor in the state:

static int aeApiCreate(aeEventLoop *eventLoop) {
    aeApiState *state = zmalloc(sizeof(aeApiState));
    if (!state) return -1;
    state->events = zmalloc(sizeof(struct epoll_event) * eventLoop->setsize);
    if (!state->events) { zfree(state); return -1; }
    state->epfd = epoll_create(1024);
    if (state->epfd == -1) { zfree(state->events); zfree(state); return -1; }
    eventLoop->apidata = state;
    return 0;
}

Adding an event uses epoll_ctl with either EPOLL_CTL_ADD or EPOLL_CTL_MOD depending on whether the fd was already monitored:

static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
    aeApiState *state = eventLoop->apidata;
    struct epoll_event ee = {0};
    int op = eventLoop->events[fd].mask == AE_NONE ? EPOLL_CTL_ADD : EPOLL_CTL_MOD;
    ee.events = 0;
    mask |= eventLoop->events[fd].mask; // merge old events
    if (mask & AE_READABLE) ee.events |= EPOLLIN;
    if (mask & AE_WRITABLE) ee.events |= EPOLLOUT;
    ee.data.fd = fd;
    if (epoll_ctl(state->epfd, op, fd, &ee) == -1) return -1;
    return 0;
}

The poll function calls epoll_wait, then translates each returned epoll_event into the fired array:

static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    aeApiState *state = eventLoop->apidata;
    int retval = epoll_wait(state->epfd, state->events, eventLoop->setsize,
        tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);
    int numevents = 0;
    if (retval > 0) {
        numevents = retval;
        for (int j = 0; j < numevents; j++) {
            int mask = 0;
            struct epoll_event *e = state->events + j;
            if (e->events & EPOLLIN) mask |= AE_READABLE;
            if (e->events & EPOLLOUT) mask |= AE_WRITABLE;
            if (e->events & EPOLLERR) mask |= AE_WRITABLE;
            if (e->events & EPOLLHUP) mask |= AE_WRITABLE;
            eventLoop->fired[j].fd = e->data.fd;
            eventLoop->fired[j].mask = mask;
        }
    }
    return numevents;
}

Sub‑module Selection

Redis chooses the best available multiplexing implementation at compile time using macros:

#ifdef HAVE_EVPORT
    #include "ae_evport.c"
#else
    #ifdef HAVE_EPOLL
        #include "ae_epoll.c"
    #else
        #ifdef HAVE_KQUEUE
            #include "ae_kqueue.c"
        #else
            #include "ae_select.c"
        #endif
    #endif
#endif

If none of the high‑performance interfaces are present, select serves as the fallback, though it has higher time complexity and a limit of 1024 descriptors.

Redis I/O Function Selection
Redis I/O Function Selection

Conclusion

Redis's I/O multiplexing module is concise and portable. By abstracting platform‑specific mechanisms behind a uniform API and selecting the optimal implementation at compile time, Redis can serve tens of thousands of file descriptors in a single‑process architecture while keeping the codebase simple and reducing the risk of bugs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

redisI/O Multiplexingepollselectevent loop
Open Source Tech Hub
Written by

Open Source Tech Hub

Sharing cutting-edge internet technologies and practical AI resources.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.