Why Convert madv_dontneed/madv_free to Per‑VMA Locks in Linux
The article explains how the traditional mmap_lock read lock creates severe contention and priority inversion for frequent madvise MADV_DONTNEED/FREE calls, and how a per‑VMA locking redesign eliminates this bottleneck, improves performance, and is slated for the next Linux kernel release.
Lock contention caused by madvise MADV_DONTNEED/FREE
mmap_lock(formerly mmap_rwsem) protects the VM area list. User‑space memory release operations such as
madvise MADV_DONTNEEDor FREE (used by malloc/free and Java GC) previously acquired the read side of mmap_lock. Because these calls occur very frequently, the read lock becomes a hotspot.
Read locks do not block other readers, but any writer blocks all subsequent readers and writers. When a low‑priority thread holds the read lock, a writer may be forced to wait, creating priority inversion.
Consider this scenario:
timestamp1: Thread A acquires the read lock
timestamp2: Thread B attempts to acquire the write lock
timestamp3: Threads C, D, and E attempt to acquire the read lock
Result: Thread B must wait for A; threads C, D, and E wait for both A and B. If A is a GC thread with a high nice value and is pre‑empted, the delay can reach a few hundred milliseconds (observed in practice).Per‑VMA lock redesign
The key observation is that in the overwhelming majority of cases MADV_DONTNEED/FREE does not cross VMA boundaries. Therefore the operation can be protected by a lock that is scoped to the single VMA being madvised, eliminating the need to acquire the global mmap_lock read lock.
Implementation consists of three patches that:
Introduce a per‑VMA lock structure.
Replace the global mmap_lock read acquisition in the madvise path with the per‑VMA lock.
Ensure the VMA remains unchanged for the duration of the madvise operation.
Performance evidence
Lance Yang performed a multi‑threaded benchmark that combines madvise with mprotect. The benchmark shows a substantial speedup for MADV_DONTNEED/FREE when the per‑VMA lock is used, confirming that the global lock was the dominant bottleneck.
Kernel integration
The patches have landed in the mm‑unstable branch and are being back‑ported to mm‑stable for inclusion in the next Linux kernel release.
References
https://lore.kernel.org/linux-mm/CAGsJ_4yeD+-xaNWyaiQSCpbZMDqF73R2AXjzBL1U--cOg6OSjg/
https://lore.kernel.org/linux-mm/ec77f310-6ded-4f7b-a15b-07855b0bbafb/
https://lore.kernel.org/linux-mm/CA+EESO6_RBX=nvrWO46aR7Q7xibh8fM-BX2p7_ihcbYyMfpVYQ/
https://lore.kernel.org/linux-mm/93385672-927f-4de5-a158-fc3fc0424be0/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
