Fundamentals 6 min read

Why Convert madvise MADV_DONTNEED/FREE to Per‑VMA Locks to Eliminate mmap_lock Contention

The article explains how replacing the global mmap_lock read lock with per‑VMA locks for madvise MADV_DONTNEED/FREE removes severe lock contention and priority inversion in Linux memory management, and it details the patches, reasoning, and performance evidence supporting the change.

Linux Code Review Hub
Linux Code Review Hub
Linux Code Review Hub
Why Convert madvise MADV_DONTNEED/FREE to Per‑VMA Locks to Eliminate mmap_lock Contention

Problem

mmap_lock (formerly mmap_rwsem ) is a global lock in the Linux memory manager. User‑space memory release paths such as malloc/free and Java GC invoke madvise with MADV_DONTNEED or MADV_FREE. These calls historically acquire the read side of mmap_lock , and because they occur extremely frequently they become a major source of lock contention.

Lock semantics and observed contention

Read‑lock semantics allow multiple readers concurrently, but any pending writer blocks all subsequent readers and writers. The following timeline illustrates the worst‑case interaction:

Consider this scenario:
 timestamp1: Thread A acquires the read lock
 timestamp2: Thread B attempts to acquire the write lock
 timestamp3: Threads C, D, and E attempt to acquire the read lock
 In this case, thread B must wait for A, and threads C, D, and E will wait for both A and B. Any write‑lock request effectively blocks all later read acquisitions.

If the thread holding the read lock is a low‑priority GC thread, preemption by higher‑priority threads can delay the write lock for a few hundred milliseconds, as observed in real workloads.

Per‑VMA lock optimization

The key observation is that MADV_DONTNEED / MADV_FREE almost never span more than one VMA. Therefore there is no need to traverse the global VMA list. By acquiring a lock that protects only the VMA being advised, the kernel guarantees that the VMA does not change during the operation while leaving all other VMAs untouched.

The optimization replaces the global mmap_lock read lock with a per‑VMA lock, eliminating the original contention point.

Implementation

The change is introduced by three patches (illustrated by the diagrams below). The core idea is to drop the global read lock and protect each VMA individually.

Patch diagram 1
Patch diagram 1
Patch diagram 2
Patch diagram 2

Performance evidence

Lance Yang performed a multithreaded test that combines madvise with mprotect. The benchmark shows a substantial speed‑up for MADV_DONTNEED and MADV_FREE when the per‑VMA lock is used.

Benchmark result
Benchmark result

Community feedback

Google engineer Lokesh Gidra described the change as “over‑due”. Oracle engineer Lorenzo Stoakes noted that the benefit is “obvious” and wondered why it had not been implemented sooner.

Integration status

The three‑patch series has landed in the mm‑unstable branch and is slated for inclusion in the next stable Linux kernel release.

References

https://lore.kernel.org/linux-mm/CAGsJ_4yeD+-xaNWyaiQSCpbZMDqF73R2AXjzBL1U--cOg6OSjg@mail.gmail.com/

https://lore.kernel.org/linux-mm/[email protected]/

https://lore.kernel.org/linux-mm/CA+EESO6_RBX=nvrWO46aR7Q7xibh8fM-BX2p7_ihcbYyMfpVYQ@mail.gmail.com/

https://lore.kernel.org/linux-mm/[email protected]/

memory managementLinux kernellock contentionmmap_lockmadviseper-VMA lock
Linux Code Review Hub
Written by

Linux Code Review Hub

A professional Linux technology community and learning platform covering the kernel, memory management, process management, file system and I/O, performance tuning, device drivers, virtualization, and cloud computing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.