Understanding Linux I/O Models: From Blocking to Zero‑Copy and epoll
This article explains the lifecycle of data in I/O operations, compares blocking, non‑blocking, and asynchronous models, introduces zero‑copy techniques, and details how select, poll, and epoll enable efficient multiplexed I/O handling in server applications.
1. Fundamentals
Before introducing I/O models, we explain the lifecycle of data during an I/O wait. A process requests data, which is first loaded into a kernel buffer and then copied to the process's application buffer before it can be accessed.
Why can't data be loaded directly into the app buffer? Although kernel‑bypass techniques like RDMA exist, most cases require copying to the kernel buffer first for safety and stability.
Are the copy operations identical? No. Modern storage devices use DMA to transfer data between memory and the device without CPU involvement, while copying between kernel and app buffers is CPU‑driven.
How is data sent over a TCP connection? The TCP/IP stack uses a send buffer and a receive buffer (socket buffers). Data is copied from the app buffer to the send buffer, then DMA transfers it to the NIC; incoming data follows the reverse path.
Is copying from kernel buffer to app buffer always required? Not when the process does not need to modify the data—zero‑copy techniques can send data directly from the kernel buffer to the send buffer.
Below is a typical data flow for an httpd process handling a file request.
2. I/O Models
An I/O model describes the process state and data handling during I/O wait. The two phases are “data preparation” (copy to kernel buffer) and “data copy” (copy to app buffer).
We use an httpd handling a local file as an example, ignoring implementation details of the web server.
2.1 Blocking I/O Model
In blocking mode the httpd thread is blocked while waiting for data to be prepared in the kernel buffer and later while copying to the app buffer, causing two context switches.
During the kernel‑buffer preparation DMA is used, so the CPU can perform other work.
Copying from kernel buffer to app buffer requires CPU involvement and keeps the thread blocked, which can improve copy speed.
This is the simplest I/O model.
2.1 Non-Blocking I/O Model
When set to non‑blocking, the first read() returns EWOULDBLOCK and the process must poll until the kernel buffer is ready, after which it is blocked only for the copy to the app buffer.
When we set a socket to be nonblocking, we are telling the kernel "when an I/O operation that I request cannot be completed without putting the process to sleep, do not put the process to sleep, but return an error instead."During the poll phase the process repeatedly issues read() until the kernel signals readiness, then the copy to the app buffer occurs.
2.3 I/O Multiplexing Model
Also called I/O multiplexing or I/O reuse, it allows monitoring multiple descriptors with select, poll, or epoll. These functions report when a descriptor is ready (readable, writable, or exceptional) without blocking the process. select and poll work similarly; select limits descriptors to 1024, poll has no such limit.
When a descriptor becomes ready, the kernel wakes the process, which then performs the actual read/write.
2.4 Signal-Driven I/O Model
After installing a signal handler (e.g., via sigaction), the kernel sends a SIGIO when data is ready. The process then performs the read, which still blocks while copying from kernel to app buffer.
2.5 Asynchronous I/O Model
The process issues an asynchronous call such as aio_read() and returns immediately. The kernel prepares the data and copies it to the app buffer, then notifies the process with a signal.
Although asynchronous I/O avoids blocking during preparation, the copy still consumes CPU and can become a bottleneck under high concurrency.
2.6 Distinguishing Synchronous and Asynchronous I/O
Blocking, non‑blocking, and I/O multiplexing are synchronous because the read() call blocks while copying data from kernel to app buffer. Only true asynchronous I/O (e.g., aio_*) decouples preparation from the read.
3. select(), poll() and epoll
These functions monitor file‑descriptor states for readability, writability, or errors. They are typically used inside an event loop.
3.1 select & poll
select()uses FD_SET / FD_ZERO macros to build a descriptor set, can block, timeout, or return immediately. poll() is similar but without the 1024‑descriptor limit.
When a descriptor is ready, select() returns the number of ready descriptors; the process then checks each with FD_ISSET.
FD_ZERO
for() {
FD_SET()
select()
if() {
FD_ISSET()
FD_CLR()
}
writen()
}3.2 epoll
epollcreates an epoll instance that can add or remove descriptors at runtime via epoll_ctl(). Ready descriptors are placed on an internal ready list, and epoll_wait() returns them without scanning the entire set.
Source: http://www.cnblogs.com/f-ck-need-u/p/7624733.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
