Fundamentals 16 min read

Why Linux Kernel Appears to ‘Steal’ Your Memory (and How It Manages Physical RAM)

The article explains why the Linux kernel reports less usable memory than the physical RAM, detailing the memblock allocator, crashkernel reservations, page‑struct overhead, and the handoff to the buddy system, showing how the kernel consumes memory for its own management.

IT Services Circle
IT Services Circle
IT Services Circle
Why Linux Kernel Appears to ‘Steal’ Your Memory (and How It Manages Physical RAM)

Running # dmidecode on a system that reports 16384 MB of RAM often shows a smaller total (e.g., 15773 MB) in free -m. The difference is memory reserved by the Linux kernel for its own bookkeeping.

1. Early memblock allocator

During early boot the kernel uses the memblock allocator to manage the raw memory layout obtained from the firmware. The allocator replaced the older bootmem implementation after 2010 (see commit

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/memblock.c?id=95f72d1ed41a66f1c1c29c24d479de81a0bea36f

).

1.1 Creating the memblock allocator

After the e820 memory detection finishes, setup_arch() calls e820__memory_setup() to store the results in the global e820_table, then invokes e820__memblock_setup() to build the allocator:

// file: arch/x86/kernel/setup.c
void __init setup_arch(char **cmdline_p) {
    ...
    e820__memory_setup();   // save detection results
    ...
    e820__memblock_setup(); // initialise memblock
}

The allocator keeps usable and reserved regions in two separate arrays:

// file: mm/memblock.c
struct memblock memblock __initdata = {
    .memory.regions = memblock_memory_init_regions,
    .memory.cnt      = 1,   /* empty dummy entry */
    .memory.max      = INIT_MEMBLOCK_MEMORY_REGIONS,
    .memory.name     = "memory",

    .reserved.regions = memblock_reserved_init_regions,
    .reserved.cnt      = 1,   /* empty dummy entry */
    .reserved.max      = INIT_MEMBLOCK_RESERVED_REGIONS,
    .reserved.name     = "reserved",

    .bottom_up      = false,
    .current_limit  = MEMBLOCK_ALLOC_ANYWHERE,
};
#define INIT_MEMBLOCK_REGIONS 128
#define INIT_MEMBLOCK_RESERVED_REGIONS INIT_MEMBLOCK_REGIONS
#define INIT_MEMBLOCK_MEMORY_REGIONS   INIT_MEMBLOCK_REGIONS

static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_MEMORY_REGIONS] __initdata;
static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_RESERVED_REGIONS] __initdata;

During creation the kernel iterates over each e820 entry, adding usable ranges with memblock_add() and reserving special ranges with memblock_reserve():

// file: arch/x86/kernel/e820.c
void __init e820__memblock_setup(void) {
    for (i = 0; i < e820_table->nr_entries; i++) {
        struct e820_entry *entry = &e820_table->entries[i];
        if (entry->type == E820_TYPE_SOFT_RESERVED)
            memblock_reserve(entry->addr, entry->size);
        memblock_add(entry->addr, entry->size);
    }
    memblock_dump_all();
}

Enabling the kernel boot parameter memblock=debug prints a detailed log, for example:

[    0.010238] MEMBLOCK configuration:
[    0.010239]  memory size = 0x00000003fff78c00 reserved size = 0x0000000003c6d144
[    0.010240]  memory.cnt  = 0x3
[    0.010241]  memory[0x0] [0x0000000000001000-0x000000000009efff] 0x000000000009e000 bytes
[    0.010243]  memory[0x1] [0x0000000000100000-0x00000000bffd9fff] 0x00000000bfeda000 bytes
[    0.010244]  memory[0x2] [0x0000000100000000-0x000000043fffffff] 0x0000000340000000 bytes
[    0.010245]  reserved.cnt  = 0x4
[    0.010246]  reserved[0x0] [0x0000000000000000-0x0000000000000fff] 0x0000000000001000 bytes
[    0.010247]  reserved[0x1] [0x00000000000f5a40-0x00000000000f5b83] 0x0000000000000144 bytes
[    0.010248]  reserved[0x2] [0x0000000001000000-0x000000000340cfff] 0x000000000240d000 bytes
[    0.010249]  reserved[0x3] [0x0000000034f31000-0x000000003678ffff] 0x000000000185f000 bytes
memblock layout diagram
memblock layout diagram

2. Requesting memory from memblock

Before the buddy allocator is active, all early‑boot memory requests go through memblock . Two important use‑cases are the crash‑kernel reserve and the page‑management initialization.

2.1 Crash kernel

Servers typically run a normal kernel and a second emergency kernel used by kdump . The crash kernel is reserved via reserve_crashkernel_low() and reserve_crashkernel():

// file: arch/x86/kernel/setup.c
static int __init reserve_crashkernel_low(void) {
    low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
    pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)
",
            (unsigned long)(low_size >> 20),
            (unsigned long)(low_base >> 20),
            (unsigned long)(low_mem_limit >> 20));
    ...
}

static void __init reserve_crashkernel(void) {
    low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
    pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)
",
            (unsigned long)(crash_size >> 20),
            (unsigned long)(crash_base >> 20),
            (unsigned long)(total_mem >> 20));
    ...
}

With memblock=debug the reservations appear in dmesg:

[    0.010832] Reserving 128MB of low memory at 2928MB for crashkernel (System low RAM: 3071MB)
[    0.010835] Reserving 128MB of memory at 17264MB for crashkernel (System RAM: 16383MB)

In the example VM two 128 MB regions (total 512 MB) are reserved for the crash kernel and are unavailable to user programs.

2.2 Page‑management initialization

Linux represents each physical 4 KB page with a struct page object, typically 64 bytes:

// file: include/linux/mm_types.h
struct page {
    unsigned long flags;
    ...
};

The initialization path is:

start_kernel
  -> setup_arch
     -> e820__memory_setup
     -> e820__memblock_setup
     -> x86_init.paging.pagetable_init (native_pagetable_init)
        -> paging_init          // allocate struct page for every page
  -> mm_init
     -> mem_init
        -> memblock_free_all   // hand over usable memory to the buddy system

Linux has used several page‑management models; the current default is SPARSEMEM . In this model memory is represented as a two‑dimensional array of struct mem_section objects, each eventually containing the struct page entries:

// file: mm/sparse.c
#ifdef CONFIG_SPARSEMEM_EXTREME
struct mem_section **mem_section;
#else
struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT];
#endif
EXPORT_SYMBOL(mem_section);
sparsemem diagram
sparsemem diagram

If struct page is 64 bytes, the overhead per 4 KB page is 64 / 4096 ≈ 1.56 %. Managing 16 GB of RAM therefore consumes roughly 256 MB for page structures alone.

3. Handing memory to the buddy system

After memblock has built the initial layout and the page structures are allocated, the kernel transfers usable memory to the buddy allocator via memblock_free_all():

// file: mm/memblock.c
void __init memblock_free_all(void) {
    unsigned long pages;
    ...
    pages = free_low_memory_core_early();
    totalram_pages_add(pages);
}

The helper free_low_memory_core_early() first reserves the previously‑reserved ranges, then iterates over the remaining free ranges and calls __free_memory_core() to hand them to the buddy system:

// file: mm/memblock.c
static unsigned long __init free_low_memory_core_early(void) {
    memmap_init_reserved_pages();               // reserve reserved pages
    for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL)
        count += __free_memory_core(start, end);
    return count;
}

The page count is added to the global _totalram_pages variable:

// file: mm/page_alloc.c
atomic_long_t _totalram_pages __read_mostly;
EXPORT_SYMBOL(_totalram_pages);

4. Summary of kernel memory consumption

The kernel does not expose the entire physical RAM to user space. Reserved regions include:

Crash‑kernel reserve (e.g., 512 MB in the example).

Page‑structure metadata (~1.56 % of total RAM).

NUMA zone and node bookkeeping (not detailed here).

These reservations are recorded by the early memblock allocator and are excluded from the memory handed to the buddy allocator, which is what user‑space tools like free report as available.

Application runtimes also incur bookkeeping overhead. For example, Go’s TCMalloc defines a type mspan struct that contains linked‑list pointers and allocation/GC bitmaps, which occupy memory that cannot be used directly by the program:

// src/runtime/mheap.go
type mspan struct {
    next *mspan // next span in the list
    prev *mspan // previous span
    allocBits  *gcBits // allocation bitmap
    gcmarkBits *gcBits // GC mark bitmap
    ...
}

Thus, the difference between the hardware‑specified RAM and the value shown by free is expected and results from essential kernel and runtime bookkeeping.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

memory managementLinux kernelbuddy systemmemblockcrashkernelpage struct
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.