Why mmap Outperforms io_uring in Real-World I/O – A Deep Linux Memory‑Mapping Guide
mmap, the classic Linux memory‑mapping technique, often surpasses the modern async io_uring in various I/O scenarios by eliminating redundant data copies, reducing system calls, and enabling zero‑copy access, while the article explains its fundamentals, workflow, performance comparisons, practical usage, pitfalls, and code examples.
Introduction
Although io_uring is hailed as the high‑performance asynchronous I/O framework in Linux, real‑world benchmarks sometimes show the older mmap technique delivering better results. The article explores why mmap can beat io_uring, clarifies common misconceptions, and outlines the proper usage boundaries for both technologies.
1. Getting Started with mmap
mmap(memory‑map) maps a file or device directly into a process’s virtual address space, allowing the program to read or write the file as if it were ordinary memory. The kernel automatically synchronises dirty pages back to the underlying storage.
Function prototype
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);Key parameters:
addr : Suggested start address (usually NULL for automatic selection).
length : Mapping size in bytes (rounded up to the nearest page).
prot : Protection flags (e.g., PROT_READ, PROT_WRITE).
flags : Mapping behavior (e.g., MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS).
fd : File descriptor (use -1 for anonymous mappings).
offset : File offset, must be page‑aligned.
Basic usage steps
Open the target file with open() to obtain fd.
Call mmap() to create the mapping; the return value is the start address of the mapped region.
Read or write the memory directly via the returned pointer.
When finished, release the mapping with munmap().
Close the file descriptor with close().
2. mmap Code Samples
Anonymous private mapping (memory allocation):
// Anonymous private mapping (allocate memory)
void *mem = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (mem == MAP_FAILED) {
perror("mmap failed");
}File‑shared mapping:
int fd = open("file.txt", O_RDWR);
void *file_mem = mmap(NULL, file_size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);3. mmap I/O Model vs. Traditional I/O
Traditional read/write involves two copies (disk → kernel buffer → user buffer) and multiple system calls. Direct I/O removes the kernel buffer but still requires a copy between user space and the device. mmap eliminates the user‑kernel copy entirely: the file is accessed through the process’s address space, and the kernel only copies data on a page‑fault (lazy loading) or when dirty pages are flushed.
Key advantages
Zero redundant copies – only one copy from disk to physical memory.
Minimal system‑call overhead – after the initial mmap and final munmap, normal memory accesses incur no extra calls.
Lazy loading – pages are loaded on first access, saving memory for large files.
Automatic sharing – multiple processes can map the same file and share physical pages.
Limitations
Mapping size must be a multiple of the page size; not ideal for variable‑length files.
Heavy random writes generate many dirty pages, causing write‑back overhead.
Small files (< 16 KB) may incur higher latency than simple read / write.
32‑bit address space limits the maximum contiguous mapping size.
4. Performance Comparison with io_uring
The article describes a benchmark on a Xeon Platinum 8380 server (128 GB RAM, 980 Pro NVMe) running Ubuntu 20.04 with kernel 5.15. Three test cases were executed:
Random read/write of 1 KB files (1000 operations).
Sequential read/write of a 1 GB file.
Mixed workload combining the above.
Results (seconds, MB/s):
Small‑file random I/O: mmap 0.8 s (1.25 MB/s) vs. io_uring 1.2 s (0.83 MB/s).
Large‑file sequential I/O: io_uring 2.5 s (400 MB/s) vs. mmap 3.0 s (333 MB/s).
Mixed workload: mmap 2.2 s vs. io_uring 3.0 s.
Interpretation: mmap excels in small‑file random access because it avoids per‑request queue management, while io_uring gains an edge on large sequential streams thanks to batch submission.
5. Why io_uring Can Lose to mmap
Principle‑level differences
io_uring uses ring buffers (submission and completion queues) to pass I/O requests between user space and kernel space. Each request still requires a system call and kernel bookkeeping. mmap, by contrast, maps the file once and then performs pure memory accesses without further kernel involvement.
Scenario suitability
io_uring shines in high‑concurrency workloads with many large I/O operations. For tiny, frequent reads/writes (e.g., configuration files), the overhead of building and submitting queue entries outweighs its benefits, making mmap faster.
Configuration pitfalls
Improper queue sizing, sub‑optimal request batching, or using tiny I/O sizes can degrade io_uring performance, whereas mmap’s usage is straightforward and less error‑prone.
6. Practical Application Scenarios
Large files that need random access (e.g., databases, media processing).
Small files that are read repeatedly and benefit from in‑process caching.
Inter‑process communication via shared memory or shared file mappings.
Allocating large, page‑aligned buffers (glibc uses mmap for allocations > 128 KB).
Unsuitable cases include:
Pure sequential streaming of very large files where the kernel’s page cache would be evicted.
Files larger than the available contiguous virtual address space on 32‑bit systems.
Files on removable or network drives where mmap semantics are unreliable.
7. Using mmap in Real Code
The article provides a complete Objective‑C example that creates a test file, maps it, appends data, and then unmaps it. The core helper functions are:
int MapFile(const char *inPathName, void **outDataPtr,
size_t *outDataLength, size_t appendSize) {
int fd = open(inPathName, O_RDWR);
if (fd < 0) return errno;
struct stat st;
if (fstat(fd, &st) != 0) return errno;
ftruncate(fd, st.st_size + appendSize);
void *ptr = mmap(NULL, st.st_size + appendSize,
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ptr == MAP_FAILED) return errno;
*outDataPtr = ptr;
*outDataLength = st.st_size;
close(fd);
return 0;
}
void ProcessFile(const char *path) {
void *addr; size_t len;
const char *add = " append_key2";
size_t addlen = strlen(add);
if (MapFile(path, &addr, &len, addlen) == 0) {
memcpy((char *)addr + len, add, addlen);
munmap(addr, len + addlen);
}
}8. Mapping Physical Devices (Advanced)
On embedded platforms, /dev/mem can be opened and mapped to access hardware registers directly, reducing CPU‑mediated copies. The steps are:
Open /dev/mem to obtain a file descriptor.
Call mmap() with the desired physical address and length.
Read/write the returned virtual address to interact with the device.
Unmap with munmap() when finished.
Conclusion
mmap remains a powerful, low‑overhead I/O mechanism that can outperform modern async frameworks in many scenarios, especially when dealing with small or random‑access workloads. Understanding its workflow, advantages, and limitations enables developers to choose the right tool for high‑performance Linux applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
