How mmap Supercharges File I/O by Cutting System Calls and Data Copies
mmap maps files directly into a process’s virtual memory, eliminating the double‑copy between kernel and user space and reducing costly read/write system calls, which boosts I/O performance, simplifies code, but requires careful handling of address space limits, page faults, and concurrency.
Why traditional read/write I/O is slow
Traditional I/O uses two copies: disk → kernel page cache, then kernel cache → user buffer. Each read/write triggers a system call, causing context switches and high CPU/memory usage, especially for large files.
Disk → kernel page cache
Kernel cache → user space buffer
How mmap bypasses the double‑copy
mmap maps a file directly into the process’s virtual address space, turning file I/O into ordinary memory accesses. The kernel and user share the same physical pages, eliminating the second copy.
Reduced system calls
After a single mmap call, the program reads data with normal memory instructions, avoiding repeated read/write calls. Example code compares the number of system calls for traditional I/O versus mmap.
// 传统IO方式读取文件
void read_file_traditional(const char* filename){
int fd = open(filename, O_RDONLY);
...
// 循环读取文件内容,每次都需要系统调用
while ((n = read(fd, buf, sizeof(buf))) > 0) {
...
}
}
// mmap方式读取文件
void read_file_mmap(const char* filename){
int fd = open(filename, O_RDONLY);
...
// 只需一次mmap系统调用
char* addr = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
unsigned long sum = 0;
// 直接通过内存访问文件内容,无需系统调用
for (size_t i = 0; i < sb.st_size; i++) {
sum += addr[i];
}
}Simplified programming model
Because the file appears as a memory array, code can search or process data with simple loops, as shown in a file‑search example.
// 传统IO方式搜索文件内容
void search_file_traditional(const char* filename, const char* pattern){
int fd = open(filename, O_RDONLY);
char buf[4096];
ssize_t n;
// 需要手动管理缓冲区,循环读取文件
while ((n = read(fd, buf, sizeof(buf))) > 0) {
// 在缓冲区中查找模式串
for (ssize_t i = 0; i < n; i++) {
if (strncmp(buf + i, pattern, strlen(pattern)) == 0) {
printf("Found pattern at offset %ld
", lseek(fd, 0, SEEK_CUR) - n + i);
}
}
}
...
}
// mmap方式搜索文件内容
void search_file_mmap(const char* filename, const char* pattern){
int fd = open(filename, O_RDONLY);
struct stat sb;
fstat(fd, &sb);
// 一次映射,直接操作内存
char* addr = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
// 可以像操作数组一样简单地遍历文件内容
for (size_t i = 0; i < sb.st_size; i++) {
if (strncmp(addr + i, pattern, strlen(pattern)) == 0) {
printf("Found pattern at offset %zu
", i);
}
}
...
}Zero‑copy data transfer
mmap shares the same physical pages between kernel and user, so data is copied only once from disk to memory. This reduces memory usage and CPU overhead.
Limitations and cautions
On 32‑bit systems the address space is limited (typically 4 GB), so mapping very large files can cause fragmentation or exhaustion. Frequent small writes may generate many page faults and TLB misses, making mmap slower than read/write. Real‑time systems must consider unpredictable page‑fault latency, and high‑concurrency scenarios require explicit synchronization (locks or atomic operations) to avoid data races.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
