Analyzing Memory-Mapped File Memory Usage and Monitoring in Linux
This article explains how memory-mapped files are used in a search engine, describes the underlying Linux mechanisms such as page cache, multi-level page tables and radix trees, and presents practical methods—including mincore, /proc smaps, pagemap and system commands—to analyze and monitor their memory consumption at both kernel and process levels.
Background In modern search engines, loading large index files via memory mapping (mmap) is common; the in‑house ESearch engine follows this practice. Index data is stored in segments, with forward and inverted indexes in separate files, all accessed through mmap, which simplifies loading logic but makes memory usage harder to control.
ESearch's indexing consumes about 60% of the service's resident memory, so monitoring mmap file usage helps both in optimizing memory and diagnosing I/O issues.
The article introduces two analysis methods: (1) measuring total physical memory occupied by loaded disk file contents, and (2) measuring memory used by a process for those files, using Linux memory‑management concepts.
Principle Analysis It briefly covers memory mapping, page cache, and page tables. Page cache stores portions of disk data in RAM to reduce I/O latency. Page tables map virtual addresses to physical pages; on x86 a single‑level table would need many contiguous pages, so Linux uses a four‑level hierarchy (PGD, PUD, PMD, PTE) to save memory.
Radix trees are used by the page cache to locate cached pages efficiently, similar to multi‑level page tables.
2.1 Memory Mapping ESearch opens index files with mmap , mapping them into the process's virtual address space. Accessing a virtual address involves checking the page table; if the PTE is invalid, the kernel looks in the page cache, loads the page if necessary, and updates the page table.
2.2 Page Cache vs. Page Table Two levels of statistics are defined: kernel‑level (how many file pages reside in page cache) and process‑level (how many of those pages are actually mapped in the process's page tables). Typically, process‑level usage is less than or equal to kernel‑level.
2.3 mincore System Call The prototype is int mincore(void *addr, size_t length, unsigned char *vec) . It reports, for each page in the specified range, whether the page is resident in physical memory (vec[i] == 1) or not (vec[i] == 0), allowing easy detection of page‑cache residency.
2.4 /proc Files
Linux exposes per‑process memory information via /proc/<pid> files:
smaps – detailed mapping of each memory region, showing virtual size (Size) and resident size (Rss). Useful for determining how much of a file is actually loaded into the process's page tables.
pagemap – raw page‑table entries for the process. Each 64‑bit entry encodes page status; bit 63 indicates whether the page is present. Reading this file (via read or fread ) lets you count pages used without the heavy locking incurred by smaps.
maps – lists the address ranges of all mmap’ed files and anonymous memory regions, helping locate the regions of interest before inspecting smaps or pagemap.
Sample kernel code shows how smaps is generated (locking with ptr_offset_map_lock , iterating over pages, and formatting output) and how pagemap entries are filled (using pte_to_pagemap_entry and add_to_pagemap ).
2.5 System Commands Simple commands can also give insight:
free -m – displays total page cache size.
vmstat – shows active and inactive page cache pages.
top – includes the SHR column, indicating shared memory such as mmap’ed files.
sar -B – reports page‑cache statistics like free pages (pgfree) and pages reclaimed (pgsteal).
Summary To assess memory usage of memory‑mapped files, use kernel‑level tools (top, smaps, pagemap) for page‑table consumption and page‑cache tools (mincore, free, vmstat) for physical memory residency. The distinction matters because page‑cache includes read‑ahead pages, while page‑table usage reflects only pages actually needed by the process.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.