Fundamentals 44 min read

Why mmap Outperforms io_uring in Real-World I/O – A Deep Linux Memory‑Mapping Guide

mmap, the classic Linux memory‑mapping technique, often surpasses the modern async io_uring in various I/O scenarios by eliminating redundant data copies, reducing system calls, and enabling zero‑copy access, while the article explains its fundamentals, workflow, performance comparisons, practical usage, pitfalls, and code examples.

Deepin Linux
Deepin Linux
Deepin Linux
Why mmap Outperforms io_uring in Real-World I/O – A Deep Linux Memory‑Mapping Guide

Introduction

Although io_uring is hailed as the high‑performance asynchronous I/O framework in Linux, real‑world benchmarks sometimes show the older mmap technique delivering better results. The article explores why mmap can beat io_uring, clarifies common misconceptions, and outlines the proper usage boundaries for both technologies.

1. Getting Started with mmap

mmap

(memory‑map) maps a file or device directly into a process’s virtual address space, allowing the program to read or write the file as if it were ordinary memory. The kernel automatically synchronises dirty pages back to the underlying storage.

Function prototype

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

Key parameters:

addr : Suggested start address (usually NULL for automatic selection).

length : Mapping size in bytes (rounded up to the nearest page).

prot : Protection flags (e.g., PROT_READ, PROT_WRITE).

flags : Mapping behavior (e.g., MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS).

fd : File descriptor (use -1 for anonymous mappings).

offset : File offset, must be page‑aligned.

Basic usage steps

Open the target file with open() to obtain fd.

Call mmap() to create the mapping; the return value is the start address of the mapped region.

Read or write the memory directly via the returned pointer.

When finished, release the mapping with munmap().

Close the file descriptor with close().

2. mmap Code Samples

Anonymous private mapping (memory allocation):

// Anonymous private mapping (allocate memory)
void *mem = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (mem == MAP_FAILED) {
    perror("mmap failed");
}

File‑shared mapping:

int fd = open("file.txt", O_RDWR);
void *file_mem = mmap(NULL, file_size, PROT_READ | PROT_WRITE,
                       MAP_SHARED, fd, 0);

3. mmap I/O Model vs. Traditional I/O

Traditional read/write involves two copies (disk → kernel buffer → user buffer) and multiple system calls. Direct I/O removes the kernel buffer but still requires a copy between user space and the device. mmap eliminates the user‑kernel copy entirely: the file is accessed through the process’s address space, and the kernel only copies data on a page‑fault (lazy loading) or when dirty pages are flushed.

Key advantages

Zero redundant copies – only one copy from disk to physical memory.

Minimal system‑call overhead – after the initial mmap and final munmap, normal memory accesses incur no extra calls.

Lazy loading – pages are loaded on first access, saving memory for large files.

Automatic sharing – multiple processes can map the same file and share physical pages.

Limitations

Mapping size must be a multiple of the page size; not ideal for variable‑length files.

Heavy random writes generate many dirty pages, causing write‑back overhead.

Small files (< 16 KB) may incur higher latency than simple read / write.

32‑bit address space limits the maximum contiguous mapping size.

4. Performance Comparison with io_uring

The article describes a benchmark on a Xeon Platinum 8380 server (128 GB RAM, 980 Pro NVMe) running Ubuntu 20.04 with kernel 5.15. Three test cases were executed:

Random read/write of 1 KB files (1000 operations).

Sequential read/write of a 1 GB file.

Mixed workload combining the above.

Results (seconds, MB/s):

Small‑file random I/O: mmap 0.8 s (1.25 MB/s) vs. io_uring 1.2 s (0.83 MB/s).

Large‑file sequential I/O: io_uring 2.5 s (400 MB/s) vs. mmap 3.0 s (333 MB/s).

Mixed workload: mmap 2.2 s vs. io_uring 3.0 s.

Interpretation: mmap excels in small‑file random access because it avoids per‑request queue management, while io_uring gains an edge on large sequential streams thanks to batch submission.

5. Why io_uring Can Lose to mmap

Principle‑level differences

io_uring uses ring buffers (submission and completion queues) to pass I/O requests between user space and kernel space. Each request still requires a system call and kernel bookkeeping. mmap, by contrast, maps the file once and then performs pure memory accesses without further kernel involvement.

Scenario suitability

io_uring shines in high‑concurrency workloads with many large I/O operations. For tiny, frequent reads/writes (e.g., configuration files), the overhead of building and submitting queue entries outweighs its benefits, making mmap faster.

Configuration pitfalls

Improper queue sizing, sub‑optimal request batching, or using tiny I/O sizes can degrade io_uring performance, whereas mmap’s usage is straightforward and less error‑prone.

6. Practical Application Scenarios

Large files that need random access (e.g., databases, media processing).

Small files that are read repeatedly and benefit from in‑process caching.

Inter‑process communication via shared memory or shared file mappings.

Allocating large, page‑aligned buffers (glibc uses mmap for allocations > 128 KB).

Unsuitable cases include:

Pure sequential streaming of very large files where the kernel’s page cache would be evicted.

Files larger than the available contiguous virtual address space on 32‑bit systems.

Files on removable or network drives where mmap semantics are unreliable.

7. Using mmap in Real Code

The article provides a complete Objective‑C example that creates a test file, maps it, appends data, and then unmaps it. The core helper functions are:

int MapFile(const char *inPathName, void **outDataPtr,
            size_t *outDataLength, size_t appendSize) {
    int fd = open(inPathName, O_RDWR);
    if (fd < 0) return errno;
    struct stat st;
    if (fstat(fd, &st) != 0) return errno;
    ftruncate(fd, st.st_size + appendSize);
    void *ptr = mmap(NULL, st.st_size + appendSize,
                     PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (ptr == MAP_FAILED) return errno;
    *outDataPtr = ptr;
    *outDataLength = st.st_size;
    close(fd);
    return 0;
}

void ProcessFile(const char *path) {
    void *addr; size_t len;
    const char *add = " append_key2";
    size_t addlen = strlen(add);
    if (MapFile(path, &addr, &len, addlen) == 0) {
        memcpy((char *)addr + len, add, addlen);
        munmap(addr, len + addlen);
    }
}

8. Mapping Physical Devices (Advanced)

On embedded platforms, /dev/mem can be opened and mapped to access hardware registers directly, reducing CPU‑mediated copies. The steps are:

Open /dev/mem to obtain a file descriptor.

Call mmap() with the desired physical address and length.

Read/write the returned virtual address to interact with the device.

Unmap with munmap() when finished.

Conclusion

mmap remains a powerful, low‑overhead I/O mechanism that can outperform modern async frameworks in many scenarios, especially when dealing with small or random‑access workloads. Understanding its workflow, advantages, and limitations enables developers to choose the right tool for high‑performance Linux applications.

mmap I/O model diagram
mmap I/O model diagram
mmap page alignment illustration
mmap page alignment illustration
mmap over‑size mapping illustration
mmap over‑size mapping illustration
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performanceio_uringLinuxmmapzero-copymemory-mapping
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.