Fundamentals 33 min read

Efficient Memory Sharing with mmap and Zero‑Copy Techniques

This article explains how mmap and zero‑copy mechanisms, combined with DMA and shared‑memory APIs, can dramatically reduce CPU involvement, context switches, and data copies during file and network I/O, thereby improving system performance for high‑throughput applications.

Deepin Linux

Dec 17, 2024

Efficient Memory Sharing with mmap and Zero‑Copy Techniques

In the era of rapid digital development, optimizing system performance is a key goal for developers, and using mmap together with zero‑copy techniques provides an efficient way to share memory.

1. Introduction

1.1 mmap Technology

mmap (memory map) maps a file or other object into a process's address space, allowing the process to read/write the memory directly without explicit read/write system calls; changes are automatically synchronized with the file, enabling inter‑process file sharing.

1.2 What is Zero‑Copy?

Zero‑copy avoids copying data between buffers by letting the CPU bypass intermediate copies, typically using DMA and memory‑region mapping, which reduces CPU cycles and memory bandwidth during network transfers.

2. DMA Technology Details

Direct Memory Access (DMA) lets peripheral devices transfer data directly to/from main memory without CPU intervention, freeing the CPU for other tasks. Modern hardware (disk controllers, NICs, GPUs) widely supports DMA.

2.1 Why DMA?

Before DMA, I/O required CPU‑mediated copies and multiple context switches, incurring high latency and CPU overhead.

2.2 What is DMA?

DMA transfers data between I/O devices and memory autonomously, while the CPU can perform other work.

2.3 Traditional File Transfer Drawbacks

Typical file transfer uses read(file, tmp_buf, len); and write(socket, tmp_buf, len);, causing four context switches and four data copies (two via DMA, two via CPU), which degrades performance under high concurrency.

3. Zero‑Copy Techniques

3.1 How to Implement Zero‑Copy?

Common methods include mmap+write, sendfile, splice, and combinations with DMA scatter/gather.

3.1.1 mmap Approach

Using mmap() maps a file into memory, then write() sends pages directly to a socket, reducing one user‑kernel copy but still involving a CPU copy in kernel space.

3.1.2 sendfile Approach

sendfile(out_fd, in_fd, offset, count)

transfers data from a file descriptor to a socket entirely within the kernel, eliminating user‑space copies and reducing two context switches.

3.1.3 splice Approach

splice

creates a pipe between kernel buffers and socket buffers, avoiding CPU copies between them.

3.2 Zero‑Copy Applications

Frameworks like Netty and Kafka use zero‑copy to achieve high‑throughput data transfer; Kafka, for example, employs FileChannel.transferTo() for efficient log replication.

4. Shared Memory with mmap

Shared memory allows kernel and user space to access the same memory region, eliminating extra copies. mmap can map a file into memory, while System V APIs ( shmget, shmat, shmdt, shmctl) provide explicit shared‑memory segments.

4.1 mmap Usage

#include <sys/mman.h>
void *addr = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset);
if (addr == MAP_FAILED) perror("mmap");
/* use addr */
munmap(addr, length);

4.2 System V Shared Memory

Creating a segment:

#include <sys/ipc.h>
#include <sys/shm.h>
int shmid = shmget(key, size, IPC_CREAT | 0666);

Attaching to a process:

void *shmaddr = shmat(shmid, NULL, 0);

Detaching:

shmdt(shmaddr);

Control operations via shmctl allow querying and removing segments.

Overall, using mmap, zero‑copy system calls, and shared‑memory APIs can dramatically reduce unnecessary data copies and context switches, leading to higher performance in file and network I/O workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization DMA mmap shared memory zero-copy

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.