Fundamentals 20 min read

Master Linux Zero‑Copy: Sendfile, Splice, mmap+write, and tee Explained

This article explains how Linux zero‑copy techniques—DMA, sendfile, splice, mmap + write, and tee—reduce CPU involvement in large file and network transfers by moving data directly within kernel space, detailing their workflows, code examples, performance trade‑offs, and suitable use cases.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Master Linux Zero‑Copy: Sendfile, Splice, mmap+write, and tee Explained

Background

When a large file is sent over the network, the classic approach performs four copies: (1) read from disk into a kernel buffer, (2) copy from kernel buffer to a user‑space buffer, (3) write from user buffer back to a kernel buffer, and (4) transfer the kernel buffer to the NIC. Each copy consumes CPU cycles, memory bandwidth and incurs context‑switch overhead.

DMA – Reducing CPU Work

Direct Memory Access (DMA) can move data between the disk and kernel memory, and between kernel memory and the NIC, without CPU intervention. DMA therefore eliminates the CPU work for steps 1 and 4, but the kernel‑to‑user and user‑to‑kernel copies (steps 2 and 3) still require CPU processing.

Linux Zero‑Copy Mechanisms

sendfile – File‑to‑socket zero‑copy

sendfile

transfers data directly from a file descriptor to a socket descriptor inside the kernel, so only a single copy occurs.

Interface

ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);
out_fd

: destination socket descriptor. in_fd: source file descriptor. offset: file offset (NULL to use the current offset). count: number of bytes to transfer.

Example

#include <sys/sendfile.h>
int main() {
    int input_fd = open("input.txt", O_RDONLY);
    int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = INADDR_ANY, .sin_port = htons(8080) };
    bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(server_fd, 3);
    int client_fd = accept(server_fd, NULL, NULL);
    sendfile(client_fd, input_fd, NULL, 1024);
    close(input_fd);
    close(client_fd);
    close(server_fd);
    return 0;
}

Best for simple large‑file‑to‑network transfers such as static file servers or streaming services.

splice – Pipe‑based zero‑copy

splice

moves data between any two file descriptors completely inside the kernel, making it suitable for more complex pipelines (e.g., file → pipe → socket).

Interface

ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags);
fd_in

: source descriptor. off_in: source offset (NULL for current). fd_out: destination descriptor. off_out: destination offset (NULL for current). len: number of bytes to transfer. flags: e.g., SPLICE_F_MOVE, SPLICE_F_MORE.

Example

int main() {
    int input_fd = open("input.txt", O_RDONLY);
    int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = INADDR_ANY, .sin_port = htons(8080) };
    bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(server_fd, 3);
    int client_fd = accept(server_fd, NULL, NULL);
    splice(input_fd, NULL, client_fd, NULL, 1024, SPLICE_F_MORE);
    close(input_fd);
    close(client_fd);
    close(server_fd);
    return 0;
}

Ideal when data must flow between files, pipes and sockets with flexible routing.

mmap + write – Mapped zero‑copy

Mapping a file into the process address space with mmap shares the same pages between kernel and user space. The mapped region can then be written to a socket with write, allowing user‑space preprocessing (e.g., compression, encryption) before transmission.

Interface

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
addr

: preferred address (NULL lets the kernel choose). length: size of the mapping. prot: protection flags (e.g., PROT_READ). flags: mapping flags (e.g., MAP_SHARED, MAP_PRIVATE). fd: file descriptor of the file to map. offset: offset within the file.

Example

#include <sys/mman.h>
#include <sys/stat.h>
int main() {
    int input_fd = open("input.txt", O_RDONLY);
    struct stat st; fstat(input_fd, &st);
    char *data = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, input_fd, 0);
    int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = INADDR_ANY, .sin_port = htons(8080) };
    bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(server_fd, 3);
    int client_fd = accept(server_fd, NULL, NULL);
    write(client_fd, data, st.st_size);
    munmap(data, st.st_size);
    close(input_fd);
    close(client_fd);
    close(server_fd);
    return 0;
}

Provides flexibility for data transformation but still incurs user‑kernel transitions.

tee – Zero‑copy duplication of pipe data

tee

copies data from one pipe to another without consuming the original data, enabling the same stream to be sent to multiple consumers.

Interface

ssize_t tee(int fd_in, int fd_out, size_t len, unsigned int flags);
fd_in

: source pipe descriptor. fd_out: destination pipe descriptor. len: number of bytes to duplicate. flags: e.g., SPLICE_F_NONBLOCK.

Example (combined with splice)

int main() {
    int pipefd[2]; pipe(pipefd);
    int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = INADDR_ANY, .sin_port = htons(8080) };
    bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(server_fd, 3);
    int client_fd = accept(server_fd, NULL, NULL);
    tee(pipefd[0], pipefd[1], 1024, 0);
    splice(pipefd[0], NULL, client_fd, NULL, 1024, SPLICE_F_MORE);
    close(pipefd[0]); close(pipefd[1]); close(client_fd); close(server_fd);
    return 0;
}

Useful for logging, broadcasting, or any scenario where the same data must reach multiple destinations.

Comparison of Zero‑Copy Methods

sendfile – Full zero‑copy, minimal CPU, ideal for file‑to‑socket transfers (static file servers, video streaming).

splice – Full zero‑copy, flexible routing between any descriptors (files, pipes, sockets), suited for complex pipelines.

mmap + write – Partial zero‑copy, moderate CPU because of user‑space access; best when data needs preprocessing before sending.

tee – Full zero‑copy duplication of pipe data, minimal CPU, perfect for multi‑target broadcasting or logging.

Conclusion

Linux offers several zero‑copy system calls— sendfile, splice, mmap + write, and tee. Each balances flexibility, CPU usage and applicability. Choose sendfile for straightforward file‑to‑network transfers, splice for arbitrary descriptor pipelines, mmap + write when preprocessing is required, and tee when the same stream must be delivered to multiple consumers.

LinuxSystem ProgrammingDMAmmapsendfileTEEspliceZero-copy
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.