Fundamentals 12 min read

How Traditional System Call I/O Works and How to Optimize It

Traditional Linux I/O relies on read() and write() system calls that involve multiple CPU and DMA copies and context switches, while modern optimizations such as zero‑copy, multiplexing, and page cache techniques reduce copying overhead, and understanding buffered, mmap, and direct I/O reveals their impact on performance.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
How Traditional System Call I/O Works and How to Optimize It

Traditional System Call I/O

In Linux, the classic way to access data is through the write() and read() system calls. read() pulls data from a file into a buffer, and write() sends buffered data to a network port.

The traditional I/O flow involves two CPU copies and two DMA copies, totaling four copies and four context switches.

CPU copy : Data is transferred directly by the CPU, occupying CPU resources.

DMA copy : The CPU commands the DMA controller to move data, freeing the CPU after completion.

Context switch : Switching from user space to kernel space for the system call, and back again after the call returns.

Read Operation

If the requested data is already in the process's page memory, it is read directly. Otherwise, the data is first loaded into the kernel's read buffer, then copied to the user’s buffer.

read(file_fd, tmp_buf, len);

The traditional read triggers two context switches, one DMA copy, and one CPU copy.

User process calls read(), causing a switch from user space to kernel space.

CPU uses DMA to move data from main memory or disk to the kernel's read buffer.

CPU copies data from the read buffer to the user buffer.

Context switches back to user space and the call returns.

Write Operation

When a program calls write(), data is first copied from the user buffer to the kernel's socket buffer, then from there to the NIC for transmission.

The traditional write triggers two context switches, one CPU copy, and one DMA copy.

User process calls write(), switching to kernel space.

CPU copies data from the user buffer to the kernel's socket buffer.

CPU uses DMA to transfer data from the socket buffer to the NIC.

Context switches back to user space and the call returns.

Network I/O

Disk I/O

High‑Performance I/O Optimizations

Zero‑copy techniques.

Multiplexing techniques.

PageCache techniques.

PageCache is the OS cache for files, reducing disk I/O by keeping file data in memory pages. It allows sequential reads and writes to approach memory speed.

PageCache read strategy : If data is in the cache, read directly; otherwise, the kernel reads from disk into a few pages and stores them in the cache.

If present in the cache, the disk is bypassed.

If absent, the kernel schedules an I/O operation to fetch the data and fills the cache.

PageCache write strategy : Data written via write() first goes to the cache and is marked dirty. A flusher thread later writes dirty pages back to disk when memory is low, dirty pages linger too long, or the process calls sync() / fsync().

Memory falls below a threshold.

Dirty pages exceed a time threshold.

Process invokes sync() or fsync().

Storage Device I/O Stack

The Linux I/O stack consists of three layers:

Filesystem layer : Kernel copies user data to the filesystem cache and eventually syncs to lower layers.

Block layer : Manages block device I/O queues, merging and ordering requests.

Device layer : Interacts with hardware via DMA.

These layers relate to mechanisms such as Buffered I/O, mmap, and Direct I/O.

Buffered I/O vs. mmap vs. Direct I/O

Buffered I/O reads a file by loading data into PageCache, then copying it to the user buffer—two copies total. mmap maps PageCache directly into user space, eliminating the second copy.

Direct I/O bypasses PageCache, moving data directly between user space and the block layer via DMA, reducing one copy for writes and improving write throughput. Reads can be faster for the first access but may lack caching benefits.

Both mmap and Direct I/O require page‑aligned buffers; Direct I/O also demands that I/O sizes be multiples of the underlying storage block size.

I/O Buffering

The diagram shows how user data flows through various buffers—stdio buffers in user space, kernel buffer cache, page cache, and device buffers—before reaching the disk.

Understanding these layers helps developers choose the appropriate I/O strategy for performance and resource efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformanceI/OLinuxpage cachesystem calls
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.