Understanding User vs Kernel Space and IO Models: Blocking, Non‑Blocking, epoll Explained
This article explains the distinction between user space and kernel space, describes blocking and non‑blocking I/O, and compares select, poll, and epoll multiplexing techniques, including signal‑driven and asynchronous I/O models with code examples.
User Space and Kernel Space
For a process, the virtual address space is divided into kernel space (shared among processes) and user space (private). User processes cannot directly access kernel space; they must copy data to kernel space via system calls.
User space can only execute limited commands (Ring 3) and must use kernel interfaces; kernel space can execute privileged commands (Ring 0) and access all system resources. Data transfer requires copying between user and kernel buffers for read and write operations.
Blocking I/O (Synchronous I/O)
The request blocks until data is returned. In the first stage, the user process attempts to read data that has not arrived, so the kernel waits and the process is blocked. In the second stage, after data is copied to the kernel buffer and then to the user buffer, the process remains blocked during the copy.
Non‑Blocking I/O (Synchronous I/O)
The call returns immediately regardless of data availability; if no data, the process receives an error and retries after a delay. The first stage is non‑blocking, the second stage (data copy) is still blocking. This model can cause busy‑waiting and high CPU usage.
I/O Multiplexing (Synchronous I/O)
File descriptors (FD) are unsigned integers that represent files, devices, or sockets. I/O multiplexing allows a single thread to monitor multiple FDs and be notified when any become ready, avoiding idle waiting.
Three mechanisms: select , poll , and epoll .
select
<code>typedef long int __fd_mask;
typedef struct {
__fd_mask fds_bits[__FD_SETSIZE / __NFDBITS];
} fd_set;
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
</code>select uses a fixed‑size fd_set (max 1024 descriptors) and returns the number of ready descriptors.
poll
<code>#define POLLIN // readable
#define POLLOUT // writable
#define POLLERR // error
#define POLLNVAL // invalid fd
struct pollfd {
int fd; // descriptor to monitor
short int events; // events to watch for
short int revents; // events that occurred
};
int poll(struct pollfd *fds, nfds_t nfds, int timeout);
</code>poll allows a dynamically sized array of pollfd structures, removing the 1024‑descriptor limit.
epoll
<code>struct eventpoll {
struct rb_root rbr; // red‑black tree of monitored FDs
struct list_head rdlist; // list of ready FDs
};
int epoll_create(int size);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event *events,
int maxevents, int timeout);
</code>epoll stores monitored FDs in a red‑black tree, supports edge‑triggered (ET) and level‑triggered (LT) notifications, and scales to a large number of descriptors without per‑call copying.
Signal‑Driven I/O (Synchronous I/O)
By registering for SIGIO, the kernel sends a signal when an FD becomes ready. The process can perform other work while waiting; upon receiving SIGIO it reads the data.
Asynchronous I/O
The application issues an
aio_readrequest and provides a callback. The kernel copies data to user space when ready and then invokes the callback, allowing the process to continue without blocking.
Summary
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.