Fundamentals 28 min read

Analyzing Linux Memory Management Locks and Key Optimization Cases

The article examines the role of various locks in Linux kernel memory management, explains their APIs and sleeping constraints, presents detailed case studies of lock‑related performance patches—including per‑memcg LRU, mmap_lock IO‑fault path, SPF, PVL, fault‑around, unmap, and rmap lockless optimizations—and summarizes common strategies for reducing lock contention and improving scalability.

Linux Kernel Journey
Linux Kernel Journey
Linux Kernel Journey
Analyzing Linux Memory Management Locks and Key Optimization Cases

1. Technical Background

Locks are crucial in Linux memory management for protecting concurrent access to critical sections. While they ensure correctness, certain lock usages can become performance bottlenecks.

2. Locks in Memory Management

2.1 PG_locked

Pages are represented by struct page with a flags field. When the PG_locked flag is set, the page is considered locked and must not be modified.

static inline bool folio_trylock(struct folio *folio) {
    return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio, 0)));
}

void __folio_lock(struct folio *folio) {
    folio_wait_bit_common(folio, PG_locked, TASK_UNINTERRUPTIBLE, EXCLUSIVE);
}

static inline void lock_page(struct page *page) {
    struct folio *folio = page_folio(page);
    might_sleep();
    if (!folio_trylock(folio))
        __folio_lock(folio);
}
lock_page

may sleep, so it must not be called from non‑sleepable contexts. The lock is used during a page fault: the kernel sets PG_locked, initiates I/O, and clears the flag when I/O completes, which explains the folio_wait_bit_killable block reason seen in systrace.

2.2 lru_lock

The LRU (Least Recently Used) reclamation algorithm protects its lists with a spinlock stored in struct lruvec:

struct lruvec {
    struct list_head lists[NR_LRU_LISTS];
    spinlock_t lru_lock; // protects LRU lists
};

Functions such as shrink_inactive_list acquire lru_lock to modify the lists.

2.3 mmap_lock

Each process’s virtual memory area (VMA) tree is protected by mmap_lock, a read‑write semaphore inside struct mm_struct:

struct mm_struct {
    ...
    struct rw_semaphore mmap_lock;
};

Typical acquisition APIs include mmap_write_lock_killable:

#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)

static inline int __down_write_killable(struct rw_semaphore *sem) {
    return __down_write_common(sem, TASK_KILLABLE);
}

int __sched down_write_killable(struct rw_semaphore *sem) {
    might_sleep();
    rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
    if (LOCK_CONTENDED_RETURN(sem, __down_write_trylock, __down_write_killable)) {
        rwsem_release(&sem->dep_map, _RET_IP_);
        return -EINTR;
    }
    return 0;
}

static inline int mmap_write_lock_killable(struct mm_struct *mm) {
    int ret;
    __mmap_lock_trace_start_locking(mm, true);
    ret = down_write_killable(&mm->mmap_lock);
    __mmap_lock_trace_acquire_returned(mm, true, ret == 0);
    return ret;
}

If the lock cannot be obtained, the caller is placed in TASK_KILLABLE (an uninterruptible sleep state).

2.4 anon_vma→rwsem

Anonymous pages use anon_vma structures linked to VMAs. A read‑write semaphore protects the red‑black tree inside struct anon_vma:

struct anon_vma {
    struct anon_vma *root;
    struct rw_semaphore rwsem; // W: modification, R: walking the list
    ...
};

Lock acquisition helpers are anon_vma_lock_write and anon_vma_lock_read, which internally call down_write or down_read on the semaphore.

2.5 mapping→i_mmap_rwsem

File‑backed pages are associated with struct address_space. The i_mmap_rwsem protects the i_mmap red‑black tree that links VMAs to the mapping:

struct address_space {
    ...
    struct rw_semaphore i_mmap_rwsem;
    ...
};

Typical acquisition uses i_mmap_lock_read or i_mmap_lock_write. The lock ensures that the page’s mapping pointer does not become NULL during truncation and that the address‑space structure is not freed while traversing the tree.

2.6 shrinker_rwsem

Historically a global read‑write semaphore protecting the shrinker_list. In kernel 6.9 it was converted to a mutex, but the article analyzes the original shrinker_rwsem behavior (kernel 6.6) and its contention.

DECLARE_RWSEM(shrinker_rwsem);

void register_shrinker_prepared(struct shrinker *shrinker) {
    down_write(&shrinker_rwsem);
    list_add_tail(&shrinker->list, &shrinker_list);
    shrinker->flags |= SHRINKER_REGISTERED;
    shrinker_debugfs_add(shrinker);
    up_write(&shrinker_rwsem);
}

Read‑side paths such as shrink_slab acquire the lock with down_read_trylock and release it via up_read.

3. Typical Optimization Cases

3.1 Per‑memcg LRU lock (Linux 5.11)

Before the patch, a single LRU lock was shared across all memory control groups (memcgs), causing high contention. Alex Shi (Alibaba) introduced a per‑memcg lru_lock, turning a large lock into many small locks. The patchset achieved a 62 % performance improvement.

3.2 mmap_lock IO‑fault path optimization (Linux 5.1)

Josef Bacik added a cached page and a retry mechanism so that the mmap_lock is released early during long I/O operations. This reduces priority‑inversion latency for high‑priority threads. The patchset introduced maybe_unlock_mmap_for_io and altered the page‑fault flow to reacquire the lock only when necessary.

3.3 Speculative Page‑Fault (SPF) (Linux 5.4‑5.10)

Peter Zijlstra’s initial SPF patch allowed page faults to proceed without holding mmap_lock as long as the VMA did not change. Laurent Dufour later fixed bugs and merged the approach. On Android, SPF reduced application start‑up time by ~6 % on average and up to 20 % for large apps.

3.4 Per‑VMA Lock (PVL) (Linux 6.4)

Suren Baghdasaryan introduced a per‑VMA lock that replaces the coarse‑grained mmap_lock for VMA modifications. Benchmarks show ~75 % of SPF’s benefit with lower complexity. The patch adds vm_lock_seq and a lightweight vma_lock that can be taken independently for each VMA.

3.5 fault_around optimization (Linux 5.6)

Vinayak Menon added the fault_around_bytes tunable (default 64 KB). When a page fault occurs, the kernel also maps nearby pages that already exist in the page cache, reducing subsequent faults. Test results show noticeable latency reductions for large memory‑intensive workloads.

3.6 Unmap optimization (Linux 5.8‑5.9)

Yang Shi (Alibaba) split large munmap operations (>1 GB) into smaller chunks and voluntarily yielded the CPU when the mmap_sem had waiters. This reduced the worst‑case unmap time from ~18 s for 320 GB to a much lower value.

3.7 rmap path lock performance (Linux 5.19)

Minchan added try‑lock wrappers for i_mmap_rwsem and anon_vma→rwsem. If the try‑lock fails, the path marks the operation as contended and skips the page, dramatically lowering average rmap latency.

3.8 Shrinker lockless (Linux 6.7)

Zhengqi (ByteDance) replaced the global shrinker_rwsem with a lock‑free design using reference counting and RCU. The change eliminated long‑lasting lock contention during memory pressure and improved overall reclaim throughput.

4. Evolution Directions

Lock‑less designs : Move to lock‑free data structures where possible (e.g., shrinker lockless, SPF).

Reduce critical‑section size : Release locks before long I/O or unmap work (IO‑fault path, unmap optimization).

Granular locking : Replace big locks with per‑entity locks (per‑memcg LRU, PVL).

Mitigate lock wait impact : Detect contention early and fallback to alternative paths (rmap try‑lock, SPF).

Decrease lock acquisition frequency : Pre‑map nearby pages (fault_around) or batch operations.

5. References

1. https://patchwork.kernel.org/project/linux-mm/cover/[email protected]/

2. https://patchwork.kernel.org/project/linux-mm/cover/[email protected]/

3. https://patchwork.kernel.org/project/linux-mm/cover/[email protected]/

4. https://patchwork.kernel.org/project/linux-mm/cover/[email protected]/

5. https://lore.kernel.org/all/[email protected]/T/

6. https://lore.kernel.org/lkml/[email protected]/

7. https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/

8. https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/

9. https://elixir.bootlin.com/linux/v6.9.7/C/ident/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance Optimizationmemory managementkernelLinuxlockingpatches
Linux Kernel Journey
Written by

Linux Kernel Journey

Linux Kernel Journey

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.