Fundamentals 11 min read

How Zero‑Copy and PageCache Supercharge File Transfer Performance

This article explains why a naïve 32 KB‑chunk file transfer incurs excessive context switches and memory copies, and how zero‑copy, PageCache, asynchronous I/O, and direct I/O techniques dramatically reduce overhead and boost throughput for large‑scale data transfers.

Liangxu Linux

Aug 9, 2023

How Zero‑Copy and PageCache Supercharge File Transfer Performance

When a server needs to send a file to a client, a straightforward implementation reads the file in small buffers (e.g., 32 KB) and issues a separate read and write system call for each chunk. For a 320 MB file this means 10 000 iterations, resulting in roughly 40 000 user‑kernel context switches and four times the original data volume being copied in memory (about 1.28 GB).

Why the naïve approach is inefficient

Each 32 KB chunk triggers two system calls, causing a user‑to‑kernel and kernel‑to‑user transition. Although a single switch costs only tens of nanoseconds to a few microseconds, the cumulative cost becomes significant under high concurrency. Additionally, the repeated memory copies waste CPU cycles and increase latency.

Zero‑copy: merging operations inside the kernel

Zero‑copy eliminates the user‑space buffer by passing the file descriptor and the TCP socket directly to a kernel routine. The kernel moves data from the PageCache to the socket buffer, reducing the number of context switches to two (one for the combined operation) and memory copies to three. If the network card supports SG‑DMA, the copy to the socket buffer can be omitted, leaving only two memory copies.

For the same 320 MB transfer with a 1.4 MB socket buffer, zero‑copy reduces context switches to roughly 400 and memory copies to about 640 MB, more than doubling performance and lowering CPU usage.

PageCache: the OS’s disk‑to‑memory cache

PageCache stores recently accessed disk blocks in RAM, using LRU eviction and read‑ahead prefetching to accelerate subsequent reads. While it improves read latency for most workloads, large files can monopolize the cache, evicting hot small files and causing unnecessary copies.

In high‑concurrency scenarios where large files dominate, it is better to bypass PageCache and use direct I/O for those files.

Asynchronous I/O and Direct I/O

Asynchronous I/O splits a read into a request phase (non‑blocking) and a completion notification phase, allowing the process to perform other work while the kernel fetches data. However, async I/O does not use PageCache, so it cannot benefit from cache‑based optimizations.

Direct I/O explicitly bypasses PageCache, sending data straight from disk to user buffers. It is useful when the application already implements its own caching (e.g., databases) or when transferring very large files that would otherwise pollute the cache.

Combining async I/O with direct I/O lets large files be transferred without blocking and without cache interference, while small files can still profit from zero‑copy.

Performance optimization page cache async I/O file transfer Direct I/O zero-copy

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.