Fundamentals 20 min read

Master Linux Zero‑Copy: Sendfile, Splice, mmap+write, and tee Explained

This article explains how Linux zero‑copy techniques—DMA, sendfile, splice, mmap + write, and tee—reduce CPU involvement in large file and network transfers by moving data directly within kernel space, detailing their workflows, code examples, performance trade‑offs, and suitable use cases.

Liangxu Linux

Dec 9, 2024

Master Linux Zero‑Copy: Sendfile, Splice, mmap+write, and tee Explained

Background

When a large file is sent over the network, the classic approach performs four copies: (1) read from disk into a kernel buffer, (2) copy from kernel buffer to a user‑space buffer, (3) write from user buffer back to a kernel buffer, and (4) transfer the kernel buffer to the NIC. Each copy consumes CPU cycles, memory bandwidth and incurs context‑switch overhead.

DMA – Reducing CPU Work

Direct Memory Access (DMA) can move data between the disk and kernel memory, and between kernel memory and the NIC, without CPU intervention. DMA therefore eliminates the CPU work for steps 1 and 4, but the kernel‑to‑user and user‑to‑kernel copies (steps 2 and 3) still require CPU processing.

Linux Zero‑Copy Mechanisms

sendfile – File‑to‑socket zero‑copy

sendfile

transfers data directly from a file descriptor to a socket descriptor inside the kernel, so only a single copy occurs.

Interface

ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

out_fd

: destination socket descriptor. in_fd: source file descriptor. offset: file offset (NULL to use the current offset). count: number of bytes to transfer.

Example

#include <sys/sendfile.h>
int main() {
    int input_fd = open("input.txt", O_RDONLY);
    int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = INADDR_ANY, .sin_port = htons(8080) };
    bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(server_fd, 3);
    int client_fd = accept(server_fd, NULL, NULL);
    sendfile(client_fd, input_fd, NULL, 1024);
    close(input_fd);
    close(client_fd);
    close(server_fd);
    return 0;
}

Best for simple large‑file‑to‑network transfers such as static file servers or streaming services.

splice – Pipe‑based zero‑copy

splice

moves data between any two file descriptors completely inside the kernel, making it suitable for more complex pipelines (e.g., file → pipe → socket).

Interface

ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags);

fd_in

: source descriptor. off_in: source offset (NULL for current). fd_out: destination descriptor. off_out: destination offset (NULL for current). len: number of bytes to transfer. flags: e.g., SPLICE_F_MOVE, SPLICE_F_MORE.

Example

int main() {
    int input_fd = open("input.txt", O_RDONLY);
    int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = INADDR_ANY, .sin_port = htons(8080) };
    bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(server_fd, 3);
    int client_fd = accept(server_fd, NULL, NULL);
    splice(input_fd, NULL, client_fd, NULL, 1024, SPLICE_F_MORE);
    close(input_fd);
    close(client_fd);
    close(server_fd);
    return 0;
}

Ideal when data must flow between files, pipes and sockets with flexible routing.

mmap + write – Mapped zero‑copy

Mapping a file into the process address space with mmap shares the same pages between kernel and user space. The mapped region can then be written to a socket with write, allowing user‑space preprocessing (e.g., compression, encryption) before transmission.

Interface

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

addr

: preferred address (NULL lets the kernel choose). length: size of the mapping. prot: protection flags (e.g., PROT_READ). flags: mapping flags (e.g., MAP_SHARED, MAP_PRIVATE). fd: file descriptor of the file to map. offset: offset within the file.

Example

#include <sys/mman.h>
#include <sys/stat.h>
int main() {
    int input_fd = open("input.txt", O_RDONLY);
    struct stat st; fstat(input_fd, &st);
    char *data = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, input_fd, 0);
    int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = INADDR_ANY, .sin_port = htons(8080) };
    bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(server_fd, 3);
    int client_fd = accept(server_fd, NULL, NULL);
    write(client_fd, data, st.st_size);
    munmap(data, st.st_size);
    close(input_fd);
    close(client_fd);
    close(server_fd);
    return 0;
}

Provides flexibility for data transformation but still incurs user‑kernel transitions.

tee – Zero‑copy duplication of pipe data

tee

copies data from one pipe to another without consuming the original data, enabling the same stream to be sent to multiple consumers.

Interface

ssize_t tee(int fd_in, int fd_out, size_t len, unsigned int flags);

fd_in

: source pipe descriptor. fd_out: destination pipe descriptor. len: number of bytes to duplicate. flags: e.g., SPLICE_F_NONBLOCK.

Example (combined with splice)

int main() {
    int pipefd[2]; pipe(pipefd);
    int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = INADDR_ANY, .sin_port = htons(8080) };
    bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(server_fd, 3);
    int client_fd = accept(server_fd, NULL, NULL);
    tee(pipefd[0], pipefd[1], 1024, 0);
    splice(pipefd[0], NULL, client_fd, NULL, 1024, SPLICE_F_MORE);
    close(pipefd[0]); close(pipefd[1]); close(client_fd); close(server_fd);
    return 0;
}

Useful for logging, broadcasting, or any scenario where the same data must reach multiple destinations.

Comparison of Zero‑Copy Methods

sendfile – Full zero‑copy, minimal CPU, ideal for file‑to‑socket transfers (static file servers, video streaming).

splice – Full zero‑copy, flexible routing between any descriptors (files, pipes, sockets), suited for complex pipelines.

mmap + write – Partial zero‑copy, moderate CPU because of user‑space access; best when data needs preprocessing before sending.

tee – Full zero‑copy duplication of pipe data, minimal CPU, perfect for multi‑target broadcasting or logging.

Conclusion

Linux offers several zero‑copy system calls— sendfile, splice, mmap + write, and tee. Each balances flexibility, CPU usage and applicability. Choose sendfile for straightforward file‑to‑network transfers, splice for arbitrary descriptor pipelines, mmap + write when preprocessing is required, and tee when the same stream must be delivered to multiple consumers.

Linux System Programming DMA mmap sendfile TEE splice Zero-copy

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

DMA – Reducing CPU Work

Linux Zero‑Copy Mechanisms

sendfile – File‑to‑socket zero‑copy

Interface

Example

splice – Pipe‑based zero‑copy

Interface

Example

mmap + write – Mapped zero‑copy

Interface

Example

tee – Zero‑copy duplication of pipe data

Interface

Example (combined with splice)

Comparison of Zero‑Copy Methods

Conclusion

Liangxu Linux

How this landed with the community

Was this worth your time?

0 Comments

mmap + write – Mapped zero‑copy