How mmap Turns Files into Memory: A Deep Dive into Virtual Memory
This article explains how mmap maps file segments into a process's virtual address space, how virtual memory and physical memory interact, and the mechanisms of address translation, paging, and segmentation that enable efficient memory sharing and file I/O in Linux.
1. One‑sentence summary of mmap
mmap lets a process treat a portion of a file as if it were ordinary memory by mapping the file into physical memory and the process's virtual address space.
2. Virtual memory and virtual address space
Virtual address space is the set of addresses a process sees; it is a remapping of all physical addresses allocated (or to be allocated) to that process. The term “virtual memory” is often used interchangeably, though it is technically a space of addresses rather than real memory.
2.1 Principles of virtual space
2.1.1 Physical memory
Physical addresses are not necessarily contiguous; they include DRAM and I/O registers. Modern CPUs map I/O registers into physical memory via the PCI bus, allowing memory‑based I/O operations.
2.1.2 The three buses
Address bus – transfers addresses.
Data bus – transfers data.
Control bus – transfers commands.
The CPU sends a read command on the control bus, the address on the address bus, the MMU translates the virtual address, and the memory returns data via the data bus.
2.1.3 Virtual‑to‑physical address translation
Virtual addresses consist of a page (or segment) number and an offset. The MMU looks up the page/segment number in a page/segment table to obtain the corresponding physical page number, then combines it with the offset to form the physical address.
Storage methods
Page‑based storage: the page number indexes a page table to obtain a physical page number; physical address = page number × page size + offset.
Segment‑based storage: the segment number indexes a segment table to get the segment base address; physical address = segment base + offset.
Segment‑page hybrid combines both methods.
3. mmap mapping
From a coder’s perspective, mmap abstracts away the virtual address space and MMU details: the process simply reads and writes memory, and the OS handles the underlying file‑to‑memory mapping.
When a file is mmap‑ed, the process accesses memory directly; any modifications are reflected back to the disk.
The virtual space provides a contiguous address range.
Before access, the address points to unmapped pages.
When the process reads, the OS allocates physical pages (page‑fault handling).
The allocated pages are initially empty, then the file data is copied into them.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
