Fundamentals 15 min read

Why Zero‑Copy Is Critical for High‑Performance I/O on Linux

The article explains how Direct Memory Access (DMA) eliminates CPU‑bound data copies, compares traditional I/O with zero‑copy techniques such as mmap and sendfile, and shows how reducing system calls and context switches can double file‑transfer throughput while highlighting the limits of kernel cache for large files.

Linux Tech Enthusiast
Linux Tech Enthusiast
Linux Tech Enthusiast
Why Zero‑Copy Is Critical for High‑Performance I/O on Linux

Direct Memory Access (DMA) moves data between devices and memory without CPU involvement, avoiding the heavy CPU load of traditional I/O where the CPU copies data four times and performs four user‑kernel context switches.

When DMA is used, the I/O flow is:

The user process calls read, causing a transition to kernel mode.

The OS forwards the request to the DMA controller, allowing the CPU to run other tasks.

DMA transfers the request to the disk.

The disk reads data into its controller buffer and signals DMA when the buffer is full.

DMA copies the data from the disk controller buffer to the kernel buffer without using the CPU.

DMA notifies the CPU; the CPU copies the data from the kernel buffer to user space and returns from the system call.

Because a read‑write pair involves two system calls, the process experiences four user‑kernel context switches and four data copies (two performed by the CPU).

Optimization goal: reduce the number of context switches and data copies.

Two common zero‑copy methods are: mmap +

write
sendfile

mmap approach : the process maps a file into its address space, so the kernel buffer is shared directly with user space. The steps are:

The process calls mmap; DMA copies disk data into the kernel buffer.

The process and kernel share the buffer.

The process calls write, and the kernel copies the buffer to the socket buffer (CPU moves the data).

The socket buffer is finally sent to the NIC.

This eliminates one copy compared with read / write, but still incurs two system calls, two context switches, and a copy from kernel buffer to socket buffer.

sendfile approach : a single system call replaces the read / write pair. In Linux 2.1 it reduces the operation to two context switches and two copies (disk → page cache → socket). Starting with kernel 2.4, the process is further streamlined:

DMA copies data from disk directly to the kernel buffer.

Only the buffer descriptor and length are passed to the socket; the NIC’s SG‑DMA moves the data from the kernel buffer to the NIC buffer, eliminating the kernel‑to‑socket copy.

The result is two data copies and two context switches, with virtually no CPU involvement, yielding at least a 2× performance boost for file transfer.

Kernel cache role : caches recently accessed data, provides read‑ahead (pre‑fetch), and merges I/O requests to reduce disk seeks. However, the cache occupies memory; large files can fill it, preventing hot small files from benefiting and even degrading performance.

For large‑file transfers, the article recommends using asynchronous I/O combined with direct I/O, which bypasses the kernel cache entirely. Asynchronous I/O avoids blocking reads, and direct I/O transfers data straight from disk to the NIC, eliminating both the cache and extra copies.

Comparison summary :

Traditional I/O: 4 context switches, 4 copies (2 by CPU).

Zero‑copy with mmap: 2 system calls, 2 context switches, 2 copies (still a kernel‑to‑socket copy).

Zero‑copy with sendfile (kernel ≥ 2.4): 2 context switches, 2 copies, no CPU data movement.

Limitations of zero‑copy include the inability for user space to process data (e.g., compress) because the data never reaches a user buffer.

Overall, zero‑copy techniques based on DMA and kernel cache dramatically improve I/O performance, but for very large files, asynchronous I/O plus direct I/O is the preferred solution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DMAmmapsendfileZero-CopyLinux I/Oasynchronous I/Okernel cache
Linux Tech Enthusiast
Written by

Linux Tech Enthusiast

Focused on sharing practical Linux technology content, covering Linux fundamentals, applications, tools, as well as databases, operating systems, network security, and other technical knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.