What Happens to mmap-Mapped Files When a Program Crashes? A Deep Dive into Linux Memory Mapping
This article explains the fundamentals of the mmap system call, its internal working mechanism, zero‑copy I/O model, advantages, typical use cases, and detailed usage guidelines—including function prototypes, parameter meanings, mapping steps, and sample code—while also exploring how mmap‑mapped files behave during program crashes.
1. mmap Technology Overview
mmap (memory map) maps a file or other object into a process's address space, allowing the process to read and write the memory region as if it were regular memory; the kernel automatically writes back dirty pages to the file, simplifying file operations.
When a program crashes, the state of the mmap‑mapped file can vary: it may remain unchanged, become corrupted, or cause other issues, depending on the OS memory management, file system sync policies, mapping mode, and file open flags.
2. How mmap Works
Calling mmap creates a new vm_area_struct that links the virtual memory area to the physical address of the file or device. Each process can have multiple vm_area_struct structures for different regions such as heap, stack, code, or mmap areas.
The vm_area_struct stores the start address and a pointer to vm_ops, which provides the system‑call functions needed to operate on that region.
When a process accesses the mapped area, a page‑fault occurs if the page is not yet in physical memory. The kernel then handles the fault, loads the required page from disk (via nopage), and later writes back dirty pages automatically or on msync() calls.
The mmap implementation can be divided into three stages:
Process initiates the mapping, creating a virtual area in its address space.
The kernel establishes a one‑to‑one mapping between the file’s physical address and the process’s virtual address.
The process accesses the area, triggering a page‑fault that copies the file data into physical memory.
3. mmap I/O Model
mmap provides a zero‑copy technique, reducing the number of data copies between user space and kernel space.
#include <sys/mman.h></code>
<code>void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);</code>
<code>int munmap(void *start, size_t length);Key characteristics of mmap:
Uses DMA to transfer data between memory and other components, bypassing the CPU.
In user space, the mapped file appears as memory without occupying physical RAM until accessed.
Requires at least four context switches when combined with write(), unlike sendfile() which can replace both read() and write().
Eliminates user‑to‑kernel copying but kernel‑to‑kernel copying still occurs.
4. Advantages of mmap
4.1 Simplified Programming
Developers can treat file data as ordinary memory, avoiding explicit read, write, and fsync calls; the OS handles these internally.
4.2 Lazy Loading via Page Faults
Only the pages actually accessed are loaded, saving physical memory and speeding up mapping.
4.3 OS‑Ensured Data Consistency
Dirty pages are automatically flushed to disk, so manual syncing is rarely needed.
4.4 Reduced Swap for Read‑Only Access
If a mapped region is never written, the OS can discard it without swapping, unlike anonymous memory.
4.5 Performance Gains
For large files or frequent random access, mmap reduces system‑call overhead and CPU usage, improving overall application responsiveness.
5. Typical Application Scenarios
Accelerating read/write of large files by mapping them into memory.
Inter‑process communication via shared memory segments.
Anonymous mappings for flexible memory allocation beyond malloc limits.
6. How to Use mmap
6.1 Important Details
The mapping size must be a multiple of the system page size (typically 4 KB). The kernel tracks the underlying file size, allowing access to any byte within the mapped region as long as it lies inside the file.
Even after the file descriptor is closed, the mapping remains valid because it references the underlying inode, not the descriptor.
6.2 Function Definition and Parameters
In Linux, the prototype is:
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);Parameters:
addr : Desired start address (usually NULL).
length : Size of the mapping in bytes.
prot : Protection flags (e.g., PROT_READ, PROT_WRITE).
flags : Mapping behavior (e.g., MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS).
fd : File descriptor (‑1 for anonymous mappings).
offset : Offset within the file, must be page‑aligned.
Return value is the start address of the mapped region or MAP_FAILED on error.
6.3 Mapping Process
The kernel creates the virtual area, links it to the file’s inode, and sets up page tables. Actual data is fetched only when the process accesses the region, causing a page‑fault that loads the needed page from disk.
6.4 Sample Code
// ViewController.m (iOS example)</code>
<code>#import <sys/mman.h></code>
<code>#import <sys/stat.h></code>
<code>int MapFile(const char *path, void **outPtr, size_t *outLen, size_t appendSize) {</code>
<code> int fd = open(path, O_RDWR, 0);</code>
<code> if (fd < 0) return errno;</code>
<code> struct stat st;</code>
<code> if (fstat(fd, &st) != 0) { close(fd); return errno; }</code>
<code> ftruncate(fd, st.st_size + appendSize);</code>
<code> *outPtr = mmap(NULL, st.st_size + appendSize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);</code>
<code> if (*outPtr == MAP_FAILED) { close(fd); return errno; }</code>
<code> *outLen = st.st_size;</code>
<code> close(fd);
<code> return 0; }</code>
<code>void ProcessFile(const char *path) {</code>
<code> void *ptr; size_t len; const char *add = " append_key2"; int addSize = strlen(add);</code>
<code> if (MapFile(path, &ptr, &len, addSize) == 0) {</code>
<code> ptr = (char *)ptr + len;</code>
<code> memcpy(ptr, add, addSize);</code>
<code> munmap(ptr, addSize + len);</code>
<code> } }6.5 Unmapping
After using mmap, call munmap(void *addr, size_t length) to release the virtual memory region.
6.6 Mapping Hardware
mmap can also map physical device memory (e.g., via /dev/mem) into user space, allowing direct DMA‑based access to hardware registers, which reduces data‑copy overhead in embedded systems.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
