Why Linux Kernel Appears to ‘Steal’ Your Memory (and How It Manages Physical RAM)
The article explains why the Linux kernel reports less usable memory than the physical RAM, detailing the memblock allocator, crashkernel reservations, page‑struct overhead, and the handoff to the buddy system, showing how the kernel consumes memory for its own management.
Running # dmidecode on a system that reports 16384 MB of RAM often shows a smaller total (e.g., 15773 MB) in free -m. The difference is memory reserved by the Linux kernel for its own bookkeeping.
1. Early memblock allocator
During early boot the kernel uses the memblock allocator to manage the raw memory layout obtained from the firmware. The allocator replaced the older bootmem implementation after 2010 (see commit
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/memblock.c?id=95f72d1ed41a66f1c1c29c24d479de81a0bea36f).
1.1 Creating the memblock allocator
After the e820 memory detection finishes, setup_arch() calls e820__memory_setup() to store the results in the global e820_table, then invokes e820__memblock_setup() to build the allocator:
// file: arch/x86/kernel/setup.c
void __init setup_arch(char **cmdline_p) {
...
e820__memory_setup(); // save detection results
...
e820__memblock_setup(); // initialise memblock
}The allocator keeps usable and reserved regions in two separate arrays:
// file: mm/memblock.c
struct memblock memblock __initdata = {
.memory.regions = memblock_memory_init_regions,
.memory.cnt = 1, /* empty dummy entry */
.memory.max = INIT_MEMBLOCK_MEMORY_REGIONS,
.memory.name = "memory",
.reserved.regions = memblock_reserved_init_regions,
.reserved.cnt = 1, /* empty dummy entry */
.reserved.max = INIT_MEMBLOCK_RESERVED_REGIONS,
.reserved.name = "reserved",
.bottom_up = false,
.current_limit = MEMBLOCK_ALLOC_ANYWHERE,
};
#define INIT_MEMBLOCK_REGIONS 128
#define INIT_MEMBLOCK_RESERVED_REGIONS INIT_MEMBLOCK_REGIONS
#define INIT_MEMBLOCK_MEMORY_REGIONS INIT_MEMBLOCK_REGIONS
static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_MEMORY_REGIONS] __initdata;
static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_RESERVED_REGIONS] __initdata;During creation the kernel iterates over each e820 entry, adding usable ranges with memblock_add() and reserving special ranges with memblock_reserve():
// file: arch/x86/kernel/e820.c
void __init e820__memblock_setup(void) {
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
if (entry->type == E820_TYPE_SOFT_RESERVED)
memblock_reserve(entry->addr, entry->size);
memblock_add(entry->addr, entry->size);
}
memblock_dump_all();
}Enabling the kernel boot parameter memblock=debug prints a detailed log, for example:
[ 0.010238] MEMBLOCK configuration:
[ 0.010239] memory size = 0x00000003fff78c00 reserved size = 0x0000000003c6d144
[ 0.010240] memory.cnt = 0x3
[ 0.010241] memory[0x0] [0x0000000000001000-0x000000000009efff] 0x000000000009e000 bytes
[ 0.010243] memory[0x1] [0x0000000000100000-0x00000000bffd9fff] 0x00000000bfeda000 bytes
[ 0.010244] memory[0x2] [0x0000000100000000-0x000000043fffffff] 0x0000000340000000 bytes
[ 0.010245] reserved.cnt = 0x4
[ 0.010246] reserved[0x0] [0x0000000000000000-0x0000000000000fff] 0x0000000000001000 bytes
[ 0.010247] reserved[0x1] [0x00000000000f5a40-0x00000000000f5b83] 0x0000000000000144 bytes
[ 0.010248] reserved[0x2] [0x0000000001000000-0x000000000340cfff] 0x000000000240d000 bytes
[ 0.010249] reserved[0x3] [0x0000000034f31000-0x000000003678ffff] 0x000000000185f000 bytes2. Requesting memory from memblock
Before the buddy allocator is active, all early‑boot memory requests go through memblock . Two important use‑cases are the crash‑kernel reserve and the page‑management initialization.
2.1 Crash kernel
Servers typically run a normal kernel and a second emergency kernel used by kdump . The crash kernel is reserved via reserve_crashkernel_low() and reserve_crashkernel():
// file: arch/x86/kernel/setup.c
static int __init reserve_crashkernel_low(void) {
low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)
",
(unsigned long)(low_size >> 20),
(unsigned long)(low_base >> 20),
(unsigned long)(low_mem_limit >> 20));
...
}
static void __init reserve_crashkernel(void) {
low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)
",
(unsigned long)(crash_size >> 20),
(unsigned long)(crash_base >> 20),
(unsigned long)(total_mem >> 20));
...
}With memblock=debug the reservations appear in dmesg:
[ 0.010832] Reserving 128MB of low memory at 2928MB for crashkernel (System low RAM: 3071MB)
[ 0.010835] Reserving 128MB of memory at 17264MB for crashkernel (System RAM: 16383MB)In the example VM two 128 MB regions (total 512 MB) are reserved for the crash kernel and are unavailable to user programs.
2.2 Page‑management initialization
Linux represents each physical 4 KB page with a struct page object, typically 64 bytes:
// file: include/linux/mm_types.h
struct page {
unsigned long flags;
...
};The initialization path is:
start_kernel
-> setup_arch
-> e820__memory_setup
-> e820__memblock_setup
-> x86_init.paging.pagetable_init (native_pagetable_init)
-> paging_init // allocate struct page for every page
-> mm_init
-> mem_init
-> memblock_free_all // hand over usable memory to the buddy systemLinux has used several page‑management models; the current default is SPARSEMEM . In this model memory is represented as a two‑dimensional array of struct mem_section objects, each eventually containing the struct page entries:
// file: mm/sparse.c
#ifdef CONFIG_SPARSEMEM_EXTREME
struct mem_section **mem_section;
#else
struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT];
#endif
EXPORT_SYMBOL(mem_section);If struct page is 64 bytes, the overhead per 4 KB page is 64 / 4096 ≈ 1.56 %. Managing 16 GB of RAM therefore consumes roughly 256 MB for page structures alone.
3. Handing memory to the buddy system
After memblock has built the initial layout and the page structures are allocated, the kernel transfers usable memory to the buddy allocator via memblock_free_all():
// file: mm/memblock.c
void __init memblock_free_all(void) {
unsigned long pages;
...
pages = free_low_memory_core_early();
totalram_pages_add(pages);
}The helper free_low_memory_core_early() first reserves the previously‑reserved ranges, then iterates over the remaining free ranges and calls __free_memory_core() to hand them to the buddy system:
// file: mm/memblock.c
static unsigned long __init free_low_memory_core_early(void) {
memmap_init_reserved_pages(); // reserve reserved pages
for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL)
count += __free_memory_core(start, end);
return count;
}The page count is added to the global _totalram_pages variable:
// file: mm/page_alloc.c
atomic_long_t _totalram_pages __read_mostly;
EXPORT_SYMBOL(_totalram_pages);4. Summary of kernel memory consumption
The kernel does not expose the entire physical RAM to user space. Reserved regions include:
Crash‑kernel reserve (e.g., 512 MB in the example).
Page‑structure metadata (~1.56 % of total RAM).
NUMA zone and node bookkeeping (not detailed here).
These reservations are recorded by the early memblock allocator and are excluded from the memory handed to the buddy allocator, which is what user‑space tools like free report as available.
Application runtimes also incur bookkeeping overhead. For example, Go’s TCMalloc defines a type mspan struct that contains linked‑list pointers and allocation/GC bitmaps, which occupy memory that cannot be used directly by the program:
// src/runtime/mheap.go
type mspan struct {
next *mspan // next span in the list
prev *mspan // previous span
allocBits *gcBits // allocation bitmap
gcmarkBits *gcBits // GC mark bitmap
...
}Thus, the difference between the hardware‑specified RAM and the value shown by free is expected and results from essential kernel and runtime bookkeeping.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
