Fundamentals 39 min read

Understanding io_uring: Linux Asynchronous I/O Framework and Its Implementation

This article provides a comprehensive overview of Linux's io_uring, explaining its design goals, shared‑memory mechanism, submission and completion queues, core system calls, performance advantages over traditional I/O models, typical use cases, and includes a complete example of a network server built with io_uring.

Deepin Linux

Sep 28, 2024

Understanding io_uring: Linux Asynchronous I/O Framework and Its Implementation

Overview of io_uring

io_uring is a high‑performance asynchronous I/O framework introduced in Linux 5.1 that aims to overcome the inefficiencies of traditional synchronous I/O, epoll, and POSIX AIO by reducing system‑call overhead and memory copies.

Design Motivation

Traditional I/O suffers from large system‑call costs and thread blocking while waiting for I/O completion. io_uring, created by Jens Axboe, solves three main problems: excessive system‑call overhead, large data‑copy overhead, and an unfriendly API that requires multiple calls for a single operation.

Shared‑Memory Mechanism

io_uring establishes a shared memory region between user space and kernel space using mmap. This region contains a Submission Queue (SQ) and a Completion Queue (CQ), allowing both sides to communicate without frequent system calls.

Submission and Completion Queues

The SQ is a ring buffer where the application places Submission Queue Entries (SQE) describing I/O operations (file descriptor, buffers, opcode, etc.). The kernel consumes SQEs, performs the I/O, and places results in the CQ as Completion Queue Entries (CQE). Two pointers—head and tail—coordinate producer/consumer progress for both queues.

Core System Calls

io_uring_setup : creates the io_uring context, returns a file descriptor, and provides offsets for mapping the shared memory.

io_uring_enter : submits pending SQEs to the kernel and optionally waits for completions.

io_uring_register : registers buffers, file descriptors, or other resources to reduce copy overhead.

Initialization Process

Calling io_uring_setup allocates the SQ, CQ, and SQE array, then the application maps them with mmap. The kernel initializes the ring structures (head, tail, size, flags) and returns capability information.

I/O Submission and Completion Flow

Fill an io_uring_sqe with operation details (opcode, fd, buffer address, length, user_data).

Advance the SQ tail pointer to publish the SQE.

Call io_uring_enter to notify the kernel.

The kernel processes the request and writes a io_uring_cqe to the CQ.

The application reads CQEs (via io_uring_wait_cqe, io_uring_peek_batch_cqe, or epoll) and uses the stored user_data to match completions with requests.

Advance the CQ head pointer with io_uring_cq_advance.

Performance Advantages

By batching submissions, eliminating per‑operation system calls, and enabling zero‑copy transfers through shared buffers, io_uring dramatically reduces latency and CPU usage compared with synchronous I/O or legacy AIO. It also supports out‑of‑order completion, allowing the kernel to schedule the most efficient operations first.

Typical Use Cases

High‑concurrency network servers (HTTP, application servers) that need to handle thousands of connections.

Database engines that perform massive file reads/writes.

Low‑latency systems such as high‑frequency trading platforms.

Real‑time game servers requiring fast network I/O.

Large‑file backup, restore, and distributed storage systems.

Code Example: Echo Server with io_uring

#include <stdio.h>
#include <liburing.h>
#include <netinet/in.h>
#include <string.h>
#include <unistd.h>

#define EVENT_ACCEPT 0
#define EVENT_READ   1
#define EVENT_WRITE  2

struct conn_info {
  int fd;
  int event;
};

int init_server(unsigned short port) {
  int sockfd = socket(AF_INET, SOCK_STREAM, 0);
  struct sockaddr_in serveraddr;
  memset(&serveraddr, 0, sizeof(struct sockaddr_in));
  serveraddr.sin_family = AF_INET;
  serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);
  serveraddr.sin_port = htons(port);
  if (-1 == bind(sockfd, (struct sockaddr *)&serveraddr, sizeof(struct sockaddr))) {
    perror("bind");
    return -1;
  }
  listen(sockfd, 10);
  return sockfd;
}

#define ENTRIES_LENGTH 1024
#define BUFFER_LENGTH  1024

int set_event_recv(struct io_uring *ring, int sockfd, void *buf, size_t len, int flags) {
  struct io_uring_sqe *sqe = io_uring_get_sqe(ring);
  struct conn_info accept_info = {.fd = sockfd, .event = EVENT_READ};
  io_uring_prep_recv(sqe, sockfd, buf, len, flags);
  memcpy(&sqe->user_data, &accept_info, sizeof(struct conn_info));
  return 0;
}

int set_event_send(struct io_uring *ring, int sockfd, void *buf, size_t len, int flags) {
  struct io_uring_sqe *sqe = io_uring_get_sqe(ring);
  struct conn_info accept_info = {.fd = sockfd, .event = EVENT_WRITE};
  io_uring_prep_send(sqe, sockfd, buf, len, flags);
  memcpy(&sqe->user_data, &accept_info, sizeof(struct conn_info));
  return 0;
}

int set_event_accept(struct io_uring *ring, int sockfd, struct sockaddr *addr, socklen_t *addrlen, int flags) {
  struct io_uring_sqe *sqe = io_uring_get_sqe(ring);
  struct conn_info accept_info = {.fd = sockfd, .event = EVENT_ACCEPT};
  io_uring_prep_accept(sqe, sockfd, (struct sockaddr *)addr, addrlen, flags);
  memcpy(&sqe->user_data, &accept_info, sizeof(struct conn_info));
  return 0;
}

int main(int argc, char *argv[]) {
  unsigned short port = 9999;
  int sockfd = init_server(port);
  struct io_uring_params params;
  memset(&params, 0, sizeof(params));
  struct io_uring ring;
  io_uring_queue_init_params(ENTRIES_LENGTH, &ring, &params);

  struct sockaddr_in clientaddr;
  socklen_t len = sizeof(clientaddr);
  set_event_accept(&ring, sockfd, (struct sockaddr *)&clientaddr, &len, 0);

  char buffer[BUFFER_LENGTH] = {0};
  while (1) {
    io_uring_submit(&ring);
    struct io_uring_cqe *cqe;
    io_uring_wait_cqe(&ring, &cqe);
    struct io_uring_cqe *cqes[128];
    int nready = io_uring_peek_batch_cqe(&ring, cqes, 128);
    for (int i = 0; i < nready; ++i) {
      struct io_uring_cqe *entry = cqes[i];
      struct conn_info result;
      memcpy(&result, &entry->user_data, sizeof(struct conn_info));
      if (result.event == EVENT_ACCEPT) {
        set_event_accept(&ring, sockfd, (struct sockaddr *)&clientaddr, &len, 0);
        int connfd = entry->res;
        set_event_recv(&ring, connfd, buffer, BUFFER_LENGTH, 0);
      } else if (result.event == EVENT_READ) {
        int ret = entry->res;
        if (ret == 0) {
          close(result.fd);
        } else if (ret > 0) {
          set_event_send(&ring, result.fd, buffer, ret, 0);
        }
      } else if (result.event == EVENT_WRITE) {
        set_event_recv(&ring, result.fd, buffer, BUFFER_LENGTH, 0);
      }
    }
    io_uring_cq_advance(&ring, nready);
  }
  return 0;
}

The example demonstrates how to initialize a TCP listening socket, set up an io_uring instance, register accept, recv, and send events, and process completions in a non‑blocking loop, achieving a scalable echo server with minimal system‑call overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance io_uring shared memory Linux kernel

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.