Fundamentals 9 min read

Understanding the Linux I/O Stack, Call Chain, and Performance Characteristics

This article explains the layered design of Linux file I/O, describes how data moves from application buffers through libc, page cache, and kernel I/O queues to disk, discusses synchronization primitives, consistency issues, and performance factors such as scheduling algorithms and hardware characteristics.

360 Tech Engineering

Jan 18, 2021

Understanding the Linux I/O Stack, Call Chain, and Performance Characteristics

Linux I/O is the foundation of file storage, and this guide summarizes its basic concepts, including the layered architecture that provides clear structure and functional decoupling.

The I/O stack consists of multiple layers: the application buffer, the libc (standard I/O) buffer, the page cache, the kernel I/O scheduler, and finally the device driver that transfers data to the disk cache. Each layer adds a copy step, which improves modularity but also introduces latency.

Typical usage with fwrite creates an application buffer, copies data to the libc buffer, and then flushes it to the page cache. The following example demonstrates this flow:

void foo() {
    char *buf = malloc(MAX_SIZE);
    strncpy(buf, src, MAX_SIZE);
    fwrite(buf, MAX_SIZE, 1, fp);
    fclose(fp);
}

Calling fclose only flushes the libc buffer to the page cache; to ensure data reaches the disk, one must also flush the kernel buffers using sync or fsync. The fflush function only moves data from the libc buffer to the page cache, while fsync forces the page cache to be written to the physical medium.

Direct writes can bypass the page cache by opening a file with the O_DIRECT flag, and raw device writes (e.g., using dd) bypass the filesystem entirely.

The I/O call chain shows that write performs a system call that copies data from the application buffer directly to the page cache, triggering a user‑to‑kernel mode switch. The kernel’s pdflush threads later move dirty pages from the page cache to the I/O scheduler queue, where algorithms such as noop, deadline, or cfq decide when to issue the actual disk operations.

On SSDs, the noop scheduler is often preferred because there is no mechanical seek time, whereas traditional HDDs benefit from elevator‑style scheduling that reduces head movement.

Consistency and safety considerations include data loss scenarios: data in the application or libc buffers is lost if the process exits; data in the page cache survives a process exit but can be lost if the kernel crashes or the machine powers off before the kernel flushes it to disk. Using O_SYNC or fsync mitigates these risks.

When multiple file descriptors write to the same file, each descriptor maintains its own file offset, leading to overwrites unless the O_APPEND flag is used, which forces each write to append at the current end of the file.

Performance bottlenecks stem mainly from disk seek time (≈10 ms per seek) and rotational speed (e.g., 15 000 rpm yields ~500 rotations per second). Typical sequential write speeds are 0–30 MiB/s for HDDs and up to 400 MiB/s for SSDs.

References: blog.chinaunix.net, zhihu.com, meik2333.com, csdn.net.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance I/O Linux operating system File System Buffering

Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.