Why Convert madvise MADV_DONTNEED/FREE to Per‑VMA Locks to Eliminate mmap_lock Contention
The article explains how replacing the global mmap_lock read lock with per‑VMA locks for madvise MADV_DONTNEED/FREE removes severe lock contention and priority inversion in Linux memory management, and it details the patches, reasoning, and performance evidence supporting the change.
Problem
mmap_lock (formerly mmap_rwsem ) is a global lock in the Linux memory manager. User‑space memory release paths such as malloc/free and Java GC invoke madvise with MADV_DONTNEED or MADV_FREE. These calls historically acquire the read side of mmap_lock , and because they occur extremely frequently they become a major source of lock contention.
Lock semantics and observed contention
Read‑lock semantics allow multiple readers concurrently, but any pending writer blocks all subsequent readers and writers. The following timeline illustrates the worst‑case interaction:
Consider this scenario:
timestamp1: Thread A acquires the read lock
timestamp2: Thread B attempts to acquire the write lock
timestamp3: Threads C, D, and E attempt to acquire the read lock
In this case, thread B must wait for A, and threads C, D, and E will wait for both A and B. Any write‑lock request effectively blocks all later read acquisitions.If the thread holding the read lock is a low‑priority GC thread, preemption by higher‑priority threads can delay the write lock for a few hundred milliseconds, as observed in real workloads.
Per‑VMA lock optimization
The key observation is that MADV_DONTNEED / MADV_FREE almost never span more than one VMA. Therefore there is no need to traverse the global VMA list. By acquiring a lock that protects only the VMA being advised, the kernel guarantees that the VMA does not change during the operation while leaving all other VMAs untouched.
The optimization replaces the global mmap_lock read lock with a per‑VMA lock, eliminating the original contention point.
Implementation
The change is introduced by three patches (illustrated by the diagrams below). The core idea is to drop the global read lock and protect each VMA individually.
Performance evidence
Lance Yang performed a multithreaded test that combines madvise with mprotect. The benchmark shows a substantial speed‑up for MADV_DONTNEED and MADV_FREE when the per‑VMA lock is used.
Community feedback
Google engineer Lokesh Gidra described the change as “over‑due”. Oracle engineer Lorenzo Stoakes noted that the benefit is “obvious” and wondered why it had not been implemented sooner.
Integration status
The three‑patch series has landed in the mm‑unstable branch and is slated for inclusion in the next stable Linux kernel release.
References
https://lore.kernel.org/linux-mm/CAGsJ_4yeD+-xaNWyaiQSCpbZMDqF73R2AXjzBL1U--cOg6OSjg@mail.gmail.com/
https://lore.kernel.org/linux-mm/[email protected]/
https://lore.kernel.org/linux-mm/CA+EESO6_RBX=nvrWO46aR7Q7xibh8fM-BX2p7_ihcbYyMfpVYQ@mail.gmail.com/
https://lore.kernel.org/linux-mm/[email protected]/
Linux Code Review Hub
A professional Linux technology community and learning platform covering the kernel, memory management, process management, file system and I/O, performance tuning, device drivers, virtualization, and cloud computing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
