Fundamentals 5 min read

Why Convert madv_dontneed/madv_free to Per‑VMA Locks in Linux

The article explains how the traditional mmap_lock read lock creates severe contention and priority inversion for frequent madvise MADV_DONTNEED/FREE calls, and how a per‑VMA locking redesign eliminates this bottleneck, improves performance, and is slated for the next Linux kernel release.

Linux Kernel Journey

Aug 5, 2025

Why Convert madv_dontneed/madv_free to Per‑VMA Locks in Linux

Lock contention caused by madvise MADV_DONTNEED/FREE

mmap_lock

(formerly mmap_rwsem) protects the VM area list. User‑space memory release operations such as

madvise

MADV_DONTNEED

or FREE (used by malloc/free and Java GC) previously acquired the read side of mmap_lock. Because these calls occur very frequently, the read lock becomes a hotspot.

Read locks do not block other readers, but any writer blocks all subsequent readers and writers. When a low‑priority thread holds the read lock, a writer may be forced to wait, creating priority inversion.

Consider this scenario:
 timestamp1: Thread A acquires the read lock
 timestamp2: Thread B attempts to acquire the write lock
 timestamp3: Threads C, D, and E attempt to acquire the read lock
Result: Thread B must wait for A; threads C, D, and E wait for both A and B.  If A is a GC thread with a high nice value and is pre‑empted, the delay can reach a few hundred milliseconds (observed in practice).

Per‑VMA lock redesign

The key observation is that in the overwhelming majority of cases MADV_DONTNEED/FREE does not cross VMA boundaries. Therefore the operation can be protected by a lock that is scoped to the single VMA being madvised, eliminating the need to acquire the global mmap_lock read lock.

Implementation consists of three patches that:

Introduce a per‑VMA lock structure.

Replace the global mmap_lock read acquisition in the madvise path with the per‑VMA lock.

Ensure the VMA remains unchanged for the duration of the madvise operation.

Performance evidence

Lance Yang performed a multi‑threaded benchmark that combines madvise with mprotect. The benchmark shows a substantial speedup for MADV_DONTNEED/FREE when the per‑VMA lock is used, confirming that the global lock was the dominant bottleneck.

Kernel integration

The patches have landed in the mm‑unstable branch and are being back‑ported to mm‑stable for inclusion in the next Linux kernel release.

References

https://lore.kernel.org/linux-mm/CAGsJ_4yeD+-xaNWyaiQSCpbZMDqF73R2AXjzBL1U--cOg6OSjg/

https://lore.kernel.org/linux-mm/ec77f310-6ded-4f7b-a15b-07855b0bbafb/

https://lore.kernel.org/linux-mm/CA+EESO6_RBX=nvrWO46aR7Q7xibh8fM-BX2p7_ihcbYyMfpVYQ/

https://lore.kernel.org/linux-mm/93385672-927f-4de5-a158-fc3fc0424be0/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux lock contention mmap_lock madvise per-VMA lock kernel memory management

Written by

Linux Kernel Journey

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.