Mastering the Zero‑Copy Trio: sendfile, mmap, and splice
This article provides a comprehensive, step‑by‑step analysis of Linux zero‑copy mechanisms—sendfile, mmap, and splice—detailing their internal workflows, performance trade‑offs, code examples, and practical selection guidelines for high‑throughput backend development.
1. Introduction to Zero‑Copy
Zero‑Copy reduces redundant data copies between user space and kernel space, improving throughput and lowering CPU usage. In Linux, the three main implementations are sendfile , mmap , and splice , which are frequent interview topics for backend and kernel developers.
1.1 Definition of Zero‑Copy
Zero‑Copy does not eliminate all copying but minimizes copies between user and kernel space, often reducing four copies to zero and system calls from two to one.
1.2 Traditional I/O Problems
A simple file‑download scenario illustrates multiple user‑kernel switches and copies, which become bottlenecks in high‑concurrency servers.
1.3 Advantages of Zero‑Copy
Higher system throughput
Lower response latency
Reduced CPU usage
2. sendfile
2.1 Working Principle
In Linux 2.1, sendfile copies data from disk to a kernel buffer (DMA) and then to a socket buffer (CPU copy). In Linux 2.4, SG‑DMA removes the CPU copy, achieving true zero‑copy.
2.2 Prototype
#include <sys/sendfile.h>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);2.3 Parameters
out_fd : destination descriptor (socket or any fd since 2.6.33)
in_fd : source file descriptor (must be a regular file)
offset : start position (NULL uses current file offset)
count : number of bytes to transfer
2.4 Usage Example
#include <sys/sendfile.h>
// ...
ssize_t sent = sendfile(client_fd, file_fd, &file_offset, file_size);The call performs only two context switches and one kernel‑space copy, dramatically improving efficiency.
2.5 Considerations
in_fdmust be a regular file; out_fd is usually a socket.
Non‑blocking sockets may require retry on EAGAIN.
Large files can block; consider asynchronous I/O or multithreading.
Linux‑specific; not portable to Windows/macOS.
3. mmap
3.1 Working Principle
mmapmaps a file into the process’s virtual address space. The kernel allocates a virtual region, creates a vm_area_struct, and establishes a lazy‑load page‑fault mechanism. Writes to MAP_SHARED are written back to the file; MAP_PRIVATE uses copy‑on‑write.
3.2 Prototype
#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);3.3 Parameters
addr : preferred start address (NULL lets kernel choose)
length : size (rounded up to page size)
prot : PROT_READ, PROT_WRITE, PROT_EXEC flags : MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS fd : file descriptor (‑1 for anonymous mapping)
offset : file offset (must be page‑aligned)
3.4 Usage Example
#include <sys/mman.h>
int fd = open("example.txt", O_RDONLY);
struct stat sb;
fstat(fd, &sb);
char *data = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
write(STDOUT_FILENO, data, sb.st_size);
munmap(data, sb.st_size);3.5 Considerations
Release the mapping with munmap to avoid memory leaks.
Mapping remains usable after the underlying file is deleted, until unmapped.
Shared mappings require synchronization (mutexes, semaphores) across processes.
4. splice
4.1 Working Principle
splicemoves data between two file descriptors using a pipe buffer, keeping the data entirely in kernel space. At least one descriptor must be a pipe.
4.2 Prototype
#include <fcntl.h>
ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags);4.3 Parameters
fd_in : input descriptor (or pipe)
off_in : offset for non‑pipe input (NULL for pipe)
fd_out : output descriptor (or pipe)
off_out : offset for non‑pipe output (NULL for pipe)
len : number of bytes to move
flags : SPLICE_F_MOVE, SPLICE_F_MORE, SPLICE_F_NONBLOCK, etc.
4.4 Usage Example (file → pipe → file)
int in = open("src.txt", O_RDONLY);
int out = open("dst.txt", O_WRONLY|O_CREAT|O_TRUNC, 0644);
int pipefd[2];
pipe(pipefd);
off_t off_in = 0, off_out = 0;
ssize_t n;
while ((n = splice(in, NULL, pipefd[1], NULL, 4096, SPLICE_F_MORE)) > 0) { }
close(pipefd[1]);
while ((n = splice(pipefd[0], NULL, out, &off_out, 4096, 0)) > 0) { }
close(in);
close(out);
close(pipefd[0]);Only two context switches occur and no user‑space copies are performed.
4.5 Considerations
One descriptor must be a pipe; otherwise EINVAL is returned.
Offsets must be NULL for pipe descriptors.
Handle EAGAIN for non‑blocking descriptors.
Linux‑specific; not portable to other OSes.
5. Comparison and Selection
5.1 Performance
sendfile and splice both achieve roughly two data copies and two context switches with low CPU usage. mmap incurs about three copies and four switches, resulting in higher CPU load.
5.2 Suitable Scenarios
sendfile – file‑to‑socket transfers (e.g., HTTP static file serving).
mmap – large‑file processing, shared‑memory IPC, random‑access workloads.
splice – pipe‑based data movement, high‑speed file‑to‑file or file‑to‑socket pipelines.
5.3 Selection Guidance
If the goal is direct file‑to‑socket delivery, prefer sendfile.
For large files or when processes need to share memory, choose mmap.
When a pipe is involved or arbitrary descriptor‑to‑descriptor movement is required, use splice.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
