Zero‑Copy Techniques in Linux: sendfile, mmap, splice and tee
This article explains the concept of zero‑copy in Linux, compares the four main system calls—sendfile, mmap, splice and tee—describes their APIs, internal mechanisms, performance characteristics, typical use‑cases and provides practical code examples for high‑performance network programming.
In modern networked applications, data‑transfer efficiency directly determines overall performance, and traditional copy‑based I/O becomes a bottleneck under high concurrency and large data volumes. Zero‑copy techniques reduce memory copies and CPU involvement, dramatically improving throughput for file servers, storage systems and streaming services.
1. Zero‑Copy Overview
1.1 What is Zero‑Copy?
Zero‑copy is a set of kernel‑level mechanisms that avoid unnecessary copying of data between user and kernel buffers, allowing data to move directly from the source (e.g., disk) to the destination (e.g., network interface) with minimal CPU work.
1.2 Problems with Traditional I/O
Typical file‑to‑socket transfers involve multiple copies: disk → kernel buffer → user buffer → kernel socket buffer → NIC. Each copy consumes CPU cycles, memory bandwidth and incurs context switches, leading to higher latency and lower scalability.
2. sendfile – Direct File‑to‑Socket Transfer
2.1 API
#include <sys/sendfile.h>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);out_fd is usually a connected socket; in_fd must be a regular file supporting mmap; offset specifies the start position (NULL means current file offset); count limits the number of bytes transferred.
2.2 How It Works
Data is read from the file into a kernel buffer via DMA, then moved directly to the socket’s kernel buffer without entering user space. Modern kernels further reduce copies by keeping only metadata in the socket buffer and letting DMA push the payload to the NIC.
2.3 Example
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <sys/sendfile.h>
#define PORT 8080
#define FILE_PATH "example.txt"
int main() {
int server_fd = socket(AF_INET, SOCK_STREAM, 0);
// error handling omitted for brevity
struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = INADDR_ANY, .sin_port = htons(PORT) };
bind(server_fd, (struct sockaddr*)&addr, sizeof(addr));
listen(server_fd, 5);
int client_fd = accept(server_fd, NULL, NULL);
int file_fd = open(FILE_PATH, O_RDONLY);
off_t offset = 0;
struct stat st; fstat(file_fd, &st);
sendfile(client_fd, file_fd, &offset, st.st_size);
close(file_fd); close(client_fd); close(server_fd);
return 0;
}3. mmap – Memory‑Mapped Files
3.1 API
#include <sys/mman.h>
void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);prot controls access (PROT_READ, PROT_WRITE, PROT_EXEC, PROT_NONE); flags choose sharing mode (MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS, MAP_FIXED). The call returns a pointer to the mapped region or MAP_FAILED on error.
3.2 Zero‑Copy Aspect
When a file is mmap‑ed, the kernel maps the file’s pages into the process’s virtual address space. The actual data is loaded on demand via page faults, and the same physical pages can be shared among multiple processes, eliminating explicit read/write copies.
3.3 Use Cases & Risks
Databases use mmap for fast random access; shared‑memory IPC also relies on mmap. Truncating a mapped file while other processes still hold the mapping can raise SIGBUS, so developers should unmap before truncation or handle the signal.
4. splice – Direct Descriptor‑to‑Descriptor Transfer
4.1 API
#include <fcntl.h>
ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags);At least one of fd_in or fd_out must be a pipe. Flags such as SPLICE_F_NONBLOCK, SPLICE_F_MORE and SPLICE_F_MOVE control blocking behavior and kernel‑side data movement.
4.2 How It Works
Data moves entirely within kernel buffers: from the input descriptor to a pipe buffer and then to the output descriptor, bypassing user space and reducing context switches.
4.3 Example – Echo Service
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <fcntl.h>
#include <assert.h>
#include <errno.h>
int main(int argc, char **argv) {
if (argc <= 2) { printf("usage: %s ip port\n", argv[0]); return 1; }
const char *ip = argv[1]; int port = atoi(argv[2]);
int sock = socket(AF_INET, SOCK_STREAM, 0);
int reuse = 1; setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(reuse));
struct sockaddr_in addr = {0}; addr.sin_family = AF_INET; addr.sin_port = htons(port); inet_pton(AF_INET, ip, &addr.sin_addr);
bind(sock, (struct sockaddr*)&addr, sizeof(addr));
listen(sock, 5);
int conn = accept(sock, NULL, NULL);
if (conn >= 0) {
int pipefd[2]; pipe(pipefd);
splice(conn, NULL, pipefd[1], NULL, 32768, SPLICE_F_MORE | SPLICE_F_MOVE);
splice(pipefd[0], NULL, conn, NULL, 32768, SPLICE_F_MORE | SPLICE_F_MOVE);
close(conn);
}
close(sock);
return 0;
}5. tee – Duplicating Pipe Data
5.1 API
#include <fcntl.h>
ssize_t tee(int fd_in, int fd_out, size_t len, unsigned int flags);Both descriptors must be pipes; the call copies up to len bytes from fd_in to fd_out> without consuming the data, allowing the same stream to be processed by multiple consumers.
5.2 Applications
tee is useful for parallel processing pipelines (e.g., sending a network stream simultaneously to an analyzer and a logger) and for duplicating output to both a terminal and a log file without extra reads.
6. Comparison & Selection
6.1 Performance
sendfile excels at file‑to‑socket transfers, mmap shines for random file access and shared memory, splice is optimal for moving data between descriptors (especially with pipes), and tee provides zero‑copy duplication for pipe‑based pipelines.
6.2 Choosing the Right Tool
Use sendfile for large static file delivery, mmap for high‑frequency file reads/writes or inter‑process sharing, splice for proxy‑style forwarding, and tee when the same data must be consumed by multiple downstream components.
7. Conclusion
Zero‑copy mechanisms are essential for building high‑performance Linux network services. By minimizing data copies, they lower CPU usage, reduce memory bandwidth pressure and improve scalability, making them indispensable in modern high‑throughput, low‑latency applications.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.