Operations 41 min read

Unlock Linux Performance: Master Memory Watermarks and OOM Killer

This article explains how Linux memory watermarks, kswapd, direct reclaim, and the OOM Killer interact, provides detailed code examples, shows real‑world case studies, and offers practical tuning steps—including kernel parameters, cgroup limits, and monitoring tools—to prevent system stalls and crashes.

Deepin Linux
Deepin Linux
Deepin Linux
Unlock Linux Performance: Master Memory Watermarks and OOM Killer

Linux Memory Watermarks

Linux divides physical memory into zones (e.g., ZONE_NORMAL, ZONE_DMA, ZONE_HIGHMEM). Each zone has three watermarks: min (triggers direct reclaim), low (wakes kswapd), and high (stops kswapd). The watermarks determine whether allocation follows a fast path or incurs reclamation overhead.

Allocation Path

If free pages are above the high watermark, allocation is fast. Between low and high, allocation proceeds via a slower path that may wake kswapd. Below low, kswapd is awakened; below min, direct reclaim runs synchronously in the allocating thread.

kswapd and Direct Reclaim

kswapd scans LRU lists and reclaims three page types: file cache (clean pages are dropped, dirty pages are written back), anonymous pages (swapped out), and slab cache (kernel objects freed via shrinkers). If free memory drops below the min watermark, the kernel performs direct reclaim, blocking the allocating thread until enough pages are freed.

Three‑Level Control Mechanism

Watermark Check in Allocation

struct page *__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
        struct zonelist *zonelist, nodemask_t *nodemask)
{
    struct page *page;
    struct zone *zone;
    for_each_zone_zonelist(zone, zonelist, gfp_mask) {
        if (zone_watermark_ok(zone, order, high_wmark_pages(zone), 0, gfp_mask)) {
            page = rmqueue(zone, order);
            if (page)
                return page;
        }
    }
    return NULL; // fall back to reclamation
}

kswapd Workflow

kswapd is awakened when free memory falls below the low watermark.

It calls balance_pgdat to iterate over zones.

For each zone, shrink_zone scans LRU lists and reclaims pages.

kswapd stops when the high watermark is reached.

Direct Reclaim Path

When memory is below the min watermark, the kernel calls __alloc_pages_direct_reclaim, which invokes shrink_nodeshrink_zone synchronously, blocking the allocating thread. If reclamation still fails, the OOM Killer is invoked.

Role of min_free_kbytes

The kernel parameter min_free_kbytes defines the global minimum free memory (KB). Each zone’s min watermark is derived from this value. Increasing min_free_kbytes raises the min watermark, causing earlier kswapd activation and reducing the chance of direct reclaim.

OOM Killer – The Final Safeguard

How OOM Killer Works

When memory is exhausted and reclamation cannot free enough pages, the kernel calls out_of_memory(). It computes an oom_score for each process based on memory usage and priority, then terminates the highest‑scoring process to free memory.

Influencing OOM Decisions

Adjust /proc/[PID]/oom_score_adj (range -1000 to 1000) to protect critical processes ( -1000) or make others more likely to be killed.

Kernel parameters /proc/sys/vm/overcommit_memory (0,1,2) control over‑commit behavior. /proc/sys/vm/panic_on_oom decides whether to panic or invoke the killer.

Practical Tuning

Inspect Watermarks

cat /proc/zoneinfo | grep -E "Node|min|low|high"

Adjust vm.min_free_kbytes

For a 16 GiB server, set vm.min_free_kbytes=335544 (≈320 MiB, ~2 % of RAM).

Temporary change: sysctl -w vm.min_free_kbytes=335544 Permanent change: add vm.min_free_kbytes = 335544 to /etc/sysctl.conf and run sysctl -p.

Verify the Effect

cat /proc/zoneinfo | grep -E "Node|min|low|high"

Higher free page counts and fewer direct‑reclaim events indicate successful tuning.

cgroup Memory Limits

Mount the cgroup filesystem (usually auto‑mounted at /sys/fs/cgroup or /sys/fs/cgroup/unified).

Create a memory cgroup, e.g., mkdir /sys/fs/cgroup/memory/my_group.

Set memory.limit_in_bytes (e.g., echo 536870912 > memory.limit_in_bytes for 512 MiB).

Add processes by writing their PIDs to tasks (cgroup v1) or cgroup.procs (cgroup v2).

Kernel Parameters for Memory Management

vm.watermark_scale_factor

– adjusts aggressiveness of watermark calculation. vm.swappiness – controls swap tendency (0‑100). vm.min_free_kbytes – sets the global minimum free memory.

Monitoring Tools

dstat -m

– real‑time memory statistics. vmstat 1 – per‑second memory, swap, and CPU information. top – interactive view; sort by memory with Shift+M.

Example: High‑Memory Allocation Triggering OOM

#include <iostream>
#include <vector>
#include <unistd.h>

void processOrderData(int orderCount) {
    std::vector<char*> memoryBlocks;
    try {
        for (int i = 0; i < orderCount; ++i) {
            char* block = new char[10 * 1024 * 1024]; // 10 MiB per order
            memoryBlocks.push_back(block);
            memset(block, '0', 10 * 1024 * 1024);
            if (i % 1000 == 0)
                std::cout << "Processed " << i << " orders" << std::endl;
        }
    } catch (const std::bad_alloc& e) {
        std::cerr << "Allocation failed: " << e.what() << std::endl;
    }
    while (true) sleep(1); // keep process alive
}

int main() {
    std::cout << "Start" << std::endl;
    processOrderData(100000); // ~1 TiB allocation
    return 0;
}

This program illustrates how unchecked memory allocation can exhaust system memory, trigger direct reclaim, and eventually invoke the OOM killer.

LinuxOOM killersystem performancekswapdWatermarks
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.