Mastering the Zero‑Copy Trio: sendfile, mmap, and splice

This article provides a comprehensive, step‑by‑step analysis of Linux zero‑copy mechanisms—sendfile, mmap, and splice—detailing their internal workflows, performance trade‑offs, code examples, and practical selection guidelines for high‑throughput backend development.

Deepin Linux
Deepin Linux
Deepin Linux
Mastering the Zero‑Copy Trio: sendfile, mmap, and splice

1. Introduction to Zero‑Copy

Zero‑Copy reduces redundant data copies between user space and kernel space, improving throughput and lowering CPU usage. In Linux, the three main implementations are sendfile , mmap , and splice , which are frequent interview topics for backend and kernel developers.

1.1 Definition of Zero‑Copy

Zero‑Copy does not eliminate all copying but minimizes copies between user and kernel space, often reducing four copies to zero and system calls from two to one.

1.2 Traditional I/O Problems

A simple file‑download scenario illustrates multiple user‑kernel switches and copies, which become bottlenecks in high‑concurrency servers.

1.3 Advantages of Zero‑Copy

Higher system throughput

Lower response latency

Reduced CPU usage

2. sendfile

2.1 Working Principle

In Linux 2.1, sendfile copies data from disk to a kernel buffer (DMA) and then to a socket buffer (CPU copy). In Linux 2.4, SG‑DMA removes the CPU copy, achieving true zero‑copy.

2.2 Prototype

#include <sys/sendfile.h>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

2.3 Parameters

out_fd : destination descriptor (socket or any fd since 2.6.33)

in_fd : source file descriptor (must be a regular file)

offset : start position (NULL uses current file offset)

count : number of bytes to transfer

2.4 Usage Example

#include <sys/sendfile.h>
// ...
ssize_t sent = sendfile(client_fd, file_fd, &file_offset, file_size);

The call performs only two context switches and one kernel‑space copy, dramatically improving efficiency.

2.5 Considerations

in_fd

must be a regular file; out_fd is usually a socket.

Non‑blocking sockets may require retry on EAGAIN.

Large files can block; consider asynchronous I/O or multithreading.

Linux‑specific; not portable to Windows/macOS.

3. mmap

3.1 Working Principle

mmap

maps a file into the process’s virtual address space. The kernel allocates a virtual region, creates a vm_area_struct, and establishes a lazy‑load page‑fault mechanism. Writes to MAP_SHARED are written back to the file; MAP_PRIVATE uses copy‑on‑write.

3.2 Prototype

#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

3.3 Parameters

addr : preferred start address (NULL lets kernel choose)

length : size (rounded up to page size)

prot : PROT_READ, PROT_WRITE, PROT_EXEC flags : MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS fd : file descriptor (‑1 for anonymous mapping)

offset : file offset (must be page‑aligned)

3.4 Usage Example

#include <sys/mman.h>
int fd = open("example.txt", O_RDONLY);
struct stat sb;
fstat(fd, &sb);
char *data = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
write(STDOUT_FILENO, data, sb.st_size);
munmap(data, sb.st_size);

3.5 Considerations

Release the mapping with munmap to avoid memory leaks.

Mapping remains usable after the underlying file is deleted, until unmapped.

Shared mappings require synchronization (mutexes, semaphores) across processes.

4. splice

4.1 Working Principle

splice

moves data between two file descriptors using a pipe buffer, keeping the data entirely in kernel space. At least one descriptor must be a pipe.

4.2 Prototype

#include <fcntl.h>
ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags);

4.3 Parameters

fd_in : input descriptor (or pipe)

off_in : offset for non‑pipe input (NULL for pipe)

fd_out : output descriptor (or pipe)

off_out : offset for non‑pipe output (NULL for pipe)

len : number of bytes to move

flags : SPLICE_F_MOVE, SPLICE_F_MORE, SPLICE_F_NONBLOCK, etc.

4.4 Usage Example (file → pipe → file)

int in = open("src.txt", O_RDONLY);
int out = open("dst.txt", O_WRONLY|O_CREAT|O_TRUNC, 0644);
int pipefd[2];
pipe(pipefd);
off_t off_in = 0, off_out = 0;
ssize_t n;
while ((n = splice(in, NULL, pipefd[1], NULL, 4096, SPLICE_F_MORE)) > 0) { }
close(pipefd[1]);
while ((n = splice(pipefd[0], NULL, out, &off_out, 4096, 0)) > 0) { }
close(in);
close(out);
close(pipefd[0]);

Only two context switches occur and no user‑space copies are performed.

4.5 Considerations

One descriptor must be a pipe; otherwise EINVAL is returned.

Offsets must be NULL for pipe descriptors.

Handle EAGAIN for non‑blocking descriptors.

Linux‑specific; not portable to other OSes.

5. Comparison and Selection

5.1 Performance

sendfile and splice both achieve roughly two data copies and two context switches with low CPU usage. mmap incurs about three copies and four switches, resulting in higher CPU load.

5.2 Suitable Scenarios

sendfile – file‑to‑socket transfers (e.g., HTTP static file serving).

mmap – large‑file processing, shared‑memory IPC, random‑access workloads.

splice – pipe‑based data movement, high‑speed file‑to‑file or file‑to‑socket pipelines.

5.3 Selection Guidance

If the goal is direct file‑to‑socket delivery, prefer sendfile.

For large files or when processes need to share memory, choose mmap.

When a pipe is involved or arbitrary descriptor‑to‑descriptor movement is required, use splice.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceLinuxmmapsendfilezero-copysplice
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.