Fundamentals 42 min read

From Zero to One: Dissecting the Underlying Principles of Linux File I/O

This article walks through the complete Linux file I/O workflow—from opening, reading, and writing files, to kernel‑level system call execution and the differences among five major I/O models—explaining buffers, caches, blocking vs. non‑blocking modes, and performance‑impacting trade‑offs.

Deepin Linux
Deepin Linux
Deepin Linux
From Zero to One: Dissecting the Underlying Principles of Linux File I/O

1. Clarify Linux File I/O

1.1 What is File I/O

File I/O (File Input/Output) is the set of system calls that the operating system provides for opening, reading, writing, and closing files, enabling data exchange between the system and external devices or files.

1.2 Linux File System

Linux uses a hierarchical directory tree starting at the root (/). Important directories include /bin, /usr, /etc, and /home. Each file has an inode that stores metadata (type, permissions, timestamps, block pointers) but not the file name or data.

1.3 File Descriptors (FD)

A file descriptor is a non‑negative integer assigned by the kernel to represent an opened file. Standard descriptors are 0 (stdin), 1 (stdout), and 2 (stderr). System calls such as read, write, and close operate on FDs.

2. Full File I/O Process

2.1 Open File – Establish Connection

The open system call creates a file descriptor. Its prototype is:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
pathname

specifies the file path; flags selects access mode (O_RDONLY, O_WRONLY, O_RDWR) and options (O_CREAT, O_TRUNC, O_APPEND). When O_CREAT is used, mode defines the file permissions (e.g., 0644). On success, open returns a non‑negative FD; on failure, it returns -1 and sets errno.

2.2 Read/Write – Data Transfer

Reading and writing use the read and write system calls:

#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);
read

returns the number of bytes actually read (0 indicates EOF, -1 indicates error). write returns the number of bytes written or -1 on error.

Data moves through user‑space buffers, the page cache, and the disk. If the data is already in the page cache (cache hit), the kernel copies it directly to the user buffer; otherwise it reads from disk into the cache first. Writes first modify the page cache (dirty pages) and are flushed to disk later.

By default, file I/O is blocking: the calling process sleeps until the operation completes. Non‑blocking I/O can be enabled with fcntl(fd, F_SETFL, O_NONBLOCK), causing read / write to return immediately with EAGAIN if the operation would block.

2.3 Close File – Release Resources

Closing a file is done with the close system call:

#include <unistd.h>
int close(int fd);

On success it returns 0; on failure -1. The kernel releases the associated file structure, updates inode metadata, writes back any dirty pages, and frees the descriptor.

3. Buffers and Caching Mechanisms

3.1 User‑Space Buffer

Applications allocate buffers in user space to batch small writes, reducing the number of system calls and context switches. Too small a buffer causes frequent calls; too large a buffer wastes memory.

3.2 Kernel Buffer and Page Cache

The kernel buffer resides in kernel space and temporarily holds data transferred between devices and processes. The page cache stores file data in memory, enabling fast reads and write‑back of dirty pages. The kernel uses an LRU algorithm with active/inactive lists to manage cache eviction.

3.3 Impact of Caching

Cache hits provide near‑memory speed; cache misses require disk I/O, which is orders of magnitude slower. Consistency issues arise because writes are first cached; fsync or fdatasync can be used to force durable writes.

4. System Call Execution Path

4.1 Open System Call

The process prepares arguments, triggers a software interrupt (e.g., syscall), the kernel validates parameters, creates a file object, links it to the process's file table, and returns the FD.

4.2 Read/Write System Calls

After validation, the kernel locates the file object, checks the current file offset, reads or writes data, updates the offset, copies data between kernel and user buffers, and returns the byte count.

4.3 Close System Call

The kernel removes the file object from the process table, releases resources, and updates inode metadata.

5. Common File I/O Models

5.1 Blocking I/O

The process blocks during both readiness and data copy phases. Simple to implement but scales poorly under high concurrency because each blocked request consumes a thread.

5.2 Non‑Blocking I/O

Calls return immediately with EAGAIN if data is not ready, allowing the application to perform other work and poll later.

5.3 I/O Multiplexing

Techniques such as select, poll, and epoll let a single thread monitor many descriptors. select has a descriptor limit; poll removes that limit; epoll (Linux‑specific) provides edge‑triggered and level‑triggered notifications with high efficiency.

5.4 Asynchronous I/O

Using the libaio API, a request is issued with aio_read and the kernel notifies completion via signals or callbacks, freeing the thread to do other work.

5.5 Zero‑Copy Techniques

System calls like sendfile transfer data directly between kernel buffers (e.g., from a file to a socket) without copying to user space, reducing CPU usage and memory bandwidth.

6. Sample Code Illustrations

Below are representative snippets that demonstrate the concepts described above. All comments have been translated to English.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int fd = open("test.txt", O_RDWR | O_CREAT, 0644);
if (fd == -1) { perror("open"); exit(1); }
ssize_t written = write(fd, "Hello, Linux File I/O!", 22);
if (written == -1) { perror("write"); close(fd); exit(1); }
lseek(fd, 0, SEEK_SET);
char buffer[1024];
ssize_t read_bytes = read(fd, buffer, sizeof(buffer)-1);
if (read_bytes == -1) { perror("read"); close(fd); exit(1); }
buffer[read_bytes] = '\0';
printf("Read: %s
", buffer);
close(fd);
#include <unistd.h>
#include <fcntl.h>
int set_nonblocking(int fd) {
    int flags = fcntl(fd, F_GETFL, 0);
    if (flags == -1) { perror("fcntl F_GETFL"); return -1; }
    if (fcntl(fd, F_SETFL, flags | O_NONBLOCK) == -1) { perror("fcntl F_SETFL"); return -1; }
    return 0;
}
#include <sys/epoll.h>
int epfd = epoll_create1(0);
int fd = open("test.txt", O_RDONLY);
struct epoll_event ev;
ev.events = EPOLLIN;
ev.data.fd = fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);
struct epoll_event events[10];
int nfds = epoll_wait(epfd, events, 10, -1);
for (int i = 0; i < nfds; ++i) {
    if (events[i].data.fd == fd) {
        char buf[1024];
        ssize_t n = read(fd, buf, sizeof(buf));
        if (n > 0) { buf[n] = '\0'; printf("Read %zd bytes: %s
", n, buf); }
    }
}
close(fd);
close(epfd);
#include <aio.h>
int fd = open("test.txt", O_RDONLY);
struct aiocb cb;
memset(&cb, 0, sizeof(cb));
cb.aio_fildes = fd;
cb.aio_buf = buffer;
cb.aio_nbytes = BUFFER_SIZE;
cb.aio_offset = 0;
aio_read(&cb);
while (aio_error(&cb) == EINPROGRESS) { /* do other work */ }
ssize_t ret = aio_return(&cb);
if (ret > 0) { buffer[ret] = '\0'; printf("Async read: %s
", buffer); }
close(fd);

The article concludes that mastering these low‑level mechanisms equips developers to diagnose I/O bottlenecks, choose the appropriate I/O model, and tune buffer sizes for optimal performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxFile I/OC ProgrammingI/O ModelsSystem CallsBuffers
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.