Efficient Memory Sharing with mmap and Zero‑Copy Techniques
This article explains how mmap and zero‑copy mechanisms, combined with DMA and shared‑memory APIs, can dramatically reduce CPU involvement, context switches, and data copies during file and network I/O, thereby improving system performance for high‑throughput applications.
In the era of rapid digital development, optimizing system performance is a key goal for developers, and using mmap together with zero‑copy techniques provides an efficient way to share memory.
1. Introduction
1.1 mmap Technology
mmap (memory map) maps a file or other object into a process's address space, allowing the process to read/write the memory directly without explicit read/write system calls; changes are automatically synchronized with the file, enabling inter‑process file sharing.
1.2 What is Zero‑Copy?
Zero‑copy avoids copying data between buffers by letting the CPU bypass intermediate copies, typically using DMA and memory‑region mapping, which reduces CPU cycles and memory bandwidth during network transfers.
2. DMA Technology Details
Direct Memory Access (DMA) lets peripheral devices transfer data directly to/from main memory without CPU intervention, freeing the CPU for other tasks. Modern hardware (disk controllers, NICs, GPUs) widely supports DMA.
2.1 Why DMA?
Before DMA, I/O required CPU‑mediated copies and multiple context switches, incurring high latency and CPU overhead.
2.2 What is DMA?
DMA transfers data between I/O devices and memory autonomously, while the CPU can perform other work.
2.3 Traditional File Transfer Drawbacks
Typical file transfer uses read(file, tmp_buf, len); and write(socket, tmp_buf, len); , causing four context switches and four data copies (two via DMA, two via CPU), which degrades performance under high concurrency.
3. Zero‑Copy Techniques
3.1 How to Implement Zero‑Copy?
Common methods include mmap+write , sendfile , splice , and combinations with DMA scatter/gather.
3.1.1 mmap Approach
Using mmap() maps a file into memory, then write() sends pages directly to a socket, reducing one user‑kernel copy but still involving a CPU copy in kernel space.
3.1.2 sendfile Approach
sendfile(out_fd, in_fd, offset, count) transfers data from a file descriptor to a socket entirely within the kernel, eliminating user‑space copies and reducing two context switches.
3.1.3 splice Approach
splice creates a pipe between kernel buffers and socket buffers, avoiding CPU copies between them.
3.2 Zero‑Copy Applications
Frameworks like Netty and Kafka use zero‑copy to achieve high‑throughput data transfer; Kafka, for example, employs FileChannel.transferTo() for efficient log replication.
4. Shared Memory with mmap
Shared memory allows kernel and user space to access the same memory region, eliminating extra copies. mmap can map a file into memory, while System V APIs ( shmget , shmat , shmdt , shmctl ) provide explicit shared‑memory segments.
4.1 mmap Usage
#include <sys/mman.h>
void *addr = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset);
if (addr == MAP_FAILED) perror("mmap");
/* use addr */
munmap(addr, length);4.2 System V Shared Memory
Creating a segment:
#include <sys/ipc.h>
#include <sys/shm.h>
int shmid = shmget(key, size, IPC_CREAT | 0666);Attaching to a process:
void *shmaddr = shmat(shmid, NULL, 0);Detaching:
shmdt(shmaddr);Control operations via shmctl allow querying and removing segments.
Overall, using mmap, zero‑copy system calls, and shared‑memory APIs can dramatically reduce unnecessary data copies and context switches, leading to higher performance in file and network I/O workloads.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.