Fundamentals 82 min read

Unlocking Linux Kernel Memory Management: From Virtual Addresses to Physical Memory

This comprehensive guide explores Linux kernel memory management, explaining virtual and physical address concepts, the layout of process and kernel virtual memory spaces on 32‑ and 64‑bit systems, the role of mm_struct and vm_area_struct, and how ELF binaries are mapped into memory, while also detailing DRAM organization and CPU memory access.

Bin's Tech Cabin

Oct 24, 2022

Unlocking Linux Kernel Memory Management: From Virtual Addresses to Physical Memory

Preface

From the start of this article we officially begin a series of source code analyses of the Linux kernel memory management subsystem. Following the style of previous series, we will use step‑by‑step diagrams to first explain the underlying principles in detail, then gradually dissect the relevant kernel source code. Having source code as evidence ensures clarity and confidence that the explanations are not fabricated.

The memory management subsystem is arguably the most complex and largest subsystem in the Linux kernel, encompassing many intricate concepts and principles. By tracing the main thread of memory management, we can expose many core OS components such as the process management subsystem, network subsystem, and file subsystem.

Because the memory management subsystem is so large and its concepts are layered, presenting them in a clear, hierarchical manner is challenging. Before writing this series, I spent considerable time thinking about the best entry point.

So, what content is suitable for the opening of this series? I believe starting from the parts developers encounter most frequently in daily work is best—for example, the classes we create, the functions we call, local variables defined inside functions, and heap‑allocated containers (Map, List, Set, …) that must reside somewhere in physical memory.

When we write business logic, we often need to reference these data structures and manipulate them.

Once a program runs, it becomes a process, and all these data structure references appear as virtual memory addresses from the process’s perspective, because both user‑mode and kernel‑mode see only virtual address space; the physical memory is hidden by the OS.

When a process accesses a virtual address, the memory management subsystem translates it into a physical address, allowing the CPU to reach the actual storage location and perform operations on the data.

What exactly is a virtual memory address?

Why does the Linux kernel introduce virtual memory instead of using physical memory directly?

What does the virtual memory space look like?

How does the kernel manage virtual memory?

What is a physical memory address? How is physical memory accessed?

In this article I will answer each of these questions in detail. Let’s get started!

1. What Is a Virtual Memory Address?

The original purpose of an address is to conveniently locate a specific real‑world location, much like a postal address.

For example, when you order a local specialty online, you fill in the recipient’s address and the sender’s address. The delivery person uses the address to locate the real house and deliver the package.

Thus, a shipping address is a virtual concept that maps to a concrete geographic location. It does not exist physically; it is a human‑defined abstraction that points to a real place.

Similarly, in computers, a memory address identifies where data is stored in memory. Memory addresses are divided into virtual and physical addresses. A virtual address is a human‑designed concept, analogous to a shipping address, while a physical address is the actual location on the memory chips, analogous to the real city, street, and house.

So, what does a 64‑bit virtual address look like? For an Intel Core i7 processor, a 64‑bit virtual address is composed of:

Global Page Directory (9 bits)

Upper Page Directory (9 bits)

Middle Page Directory (9 bits)

Page Table Entry (9 bits)

Page Offset (12 bits)

This totals 48 bits, forming the virtual address.

Understanding the overall format of a virtual address is sufficient for now; the detailed meanings of page‑directory entries and page‑table entries will be covered later.

On a 32‑bit system, the format is simpler: Page Directory (10 bits) + Page Table (10 bits) + Page Offset (12 bits), totaling 32 bits.

Each byte in a process’s virtual memory space has a corresponding virtual address; a virtual address uniquely identifies a byte.

2. Why Use Virtual Addresses to Access Memory?

After learning what a virtual address is, you may wonder why we don’t just use physical addresses directly.

If a program used physical addresses directly, the programmer would need to know the exact location of every variable, manually layout physical memory, decide how much memory each process gets, handle memory pressure, and avoid address conflicts between processes—an extremely tedious and error‑prone task.

In a single‑process embedded system, this might be manageable, but modern operating systems support many processes, making direct physical addressing impractical.

Consider the following simple Java program:

public static void main(String[] args) throws Exception {
    String i = args[0];
    // ...
}

If we launch three JVM processes (a, b, c) with the same code and assume variable i resides at physical address 0x354, all three processes would write to the same physical location, causing address conflicts.

Therefore, we need a mechanism that isolates each process’s address space while allowing the OS to manage memory efficiently. This is where the principle of locality comes in.

Program locality consists of temporal locality (recently executed instructions or accessed data are likely to be used again soon) and spatial locality (data near recently accessed data are likely to be accessed soon).

Because of locality, a process only needs a small portion of physical memory at any given time. The kernel can allocate just enough physical pages to satisfy the active working set, freeing the rest.

Virtual memory provides each process with its own isolated address space, making processes believe they own the entire memory. The kernel handles the actual mapping to physical memory, greatly reducing the programmer’s mental burden.

Kernel threads differ from user‑mode threads in that they have no mm_struct (their mm pointer is NULL). When a kernel thread is scheduled, the kernel can reuse the previous user‑mode process’s address space for the thread, avoiding the overhead of allocating a new mm_struct and switching address spaces.

The distinction between parent/child processes, processes vs. threads, and kernel vs. user threads all revolves around the mm_struct .

Now that we understand how the kernel creates the mm_struct for a new process, let’s dive deeper into its internal fields to see how the kernel manages a process’s virtual memory.

5.1 How the Kernel Divides User‑Mode and Kernel‑Mode Virtual Memory

The task_size field in mm_struct defines the boundary between user‑mode and kernel‑mode address spaces.

struct mm_struct { unsigned long task_size; /* size of task vm space */ }

On a 32‑bit system, user‑mode space is 3 GB (0x00000000‑0xC0000000) and kernel‑mode space is 1 GB (0xC0000000‑0xFFFFFFFF). Thus task_size equals 0xC0000000.

The definition can be found in /arch/x86/include/asm/page_32_types.h: #define TASK_SIZE __PAGE_OFFSET On a 64‑bit system, only the lower 48 bits are used. User‑mode space occupies the low 128 TB (0x0000000000000000‑0x00007FFFFFFFF000) and kernel‑mode space occupies the high 128 TB (0xFFFF800000000000‑0xFFFFFFFFFFFFFFFF). The kernel‑mode boundary is 0x00007FFFFFFFF000, which is the value of task_size on 64‑bit.

#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : TASK_SIZE_MAX)
#define TASK_SIZE_MAX ( (1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE )
#define __VIRTUAL_MASK_SHIFT 47

On 64‑bit systems the layout of virtual memory is closely related to the page size (default 4 KB).

PAGE_SIZE is defined as:

#define PAGE_SHIFT 12
#define PAGE_SIZE (1UL << PAGE_SHIFT)

5.2 How the Kernel Lays Out Process Virtual Memory

The kernel uses the mm_struct fields to define the locations of various regions:

struct mm_struct {
    unsigned long task_size;
    unsigned long start_code, end_code;   // code segment
    unsigned long start_data, end_data;   // data segment
    unsigned long start_brk, brk;         // heap
    unsigned long start_stack;            // stack
    unsigned long arg_start, arg_end;    // arguments
    unsigned long env_start, env_end;    // environment
    unsigned long mmap_base;             // base of mmap area
    unsigned long total_vm;               // total pages mapped
    unsigned long locked_vm;              // pages locked in memory
    unsigned long pinned_vm;              // pages that cannot be moved
    unsigned long data_vm;                // pages in data segment
    unsigned long exec_vm;                // pages in code segment
    unsigned long stack_vm;               // pages in stack
    ...
};

Code segment – start_code to end_code holds the executable machine code loaded from the binary.

Data segment – start_data to end_data holds initialized global and static variables.

BSS segment – uninitialized globals/static variables are zero‑filled and placed after the data segment.

Heap – start_brk marks the beginning, brk marks the current end. Small allocations (<128 KB) adjust brk via brk() or sbrk().

When malloc requests a small block (<128 KB), the kernel simply moves the brk pointer.

Memory‑mapped area – grows downward from mmap_base. It holds shared libraries, mmap ed files, and large heap allocations.

Stack – grows downward from a high address; start_stack is the bottom (lowest address), while the stack pointer (RSP) points to the top.

Other fields ( total_vm, locked_vm, pinned_vm, etc.) track page‑level statistics for swapping, locking, and accounting.

5.3 How the Kernel Represents Virtual Memory Areas (VMAs)

Each contiguous region in a process’s address space is described by a struct vm_area_struct (VMA):

struct vm_area_struct {
    unsigned long vm_start;   // start address (inclusive)
    unsigned long vm_end;     // end address (exclusive)
    pgprot_t vm_page_prot;    // page‑level protection bits
    unsigned long vm_flags;  // VMA‑level flags (read/write/exec/...)
    struct anon_vma *anon_vma; // for anonymous mappings
    struct file *vm_file;      // backing file (NULL for anonymous)
    unsigned long vm_pgoff;   // offset within file (in pages)
    void *vm_private_data;    // VMA‑specific data
    const struct vm_operations_struct *vm_ops; // callbacks
    struct vm_area_struct *vm_next, *vm_prev; // linked list
    struct rb_node vm_rb;     // red‑black tree node
    struct mm_struct *vm_mm;  // back‑pointer to owning mm_struct
    ...
};

The kernel maintains VMAs in two structures: a doubly linked list (ordered by address) for efficient traversal, and a red‑black tree for O(log N) lookup.

struct mm_struct { struct vm_area_struct *mmap; /* head of VMA list */ struct rb_root mm_rb; /* root of VMA RB‑tree */ };

Typical VMA flags include: VM_READ – readable VM_WRITE – writable VM_EXEC – executable VM_SHARED – shared between processes VM_IO – can be mapped to device I/O VM_RESERVED – cannot be swapped out VM_SEQ_READ – hints sequential access VM_RAND_READ – hints random access

These flags influence page‑table entries and kernel behavior such as prefetching and swapping.

5.4 Operations on VMAs

The vm_operations_struct provides callbacks for VMA lifecycle events:

struct vm_operations_struct {
    void (*open)(struct vm_area_struct *area);
    void (*close)(struct vm_area_struct *area);
    vm_fault_t (*fault)(struct vm_fault *vmf);
    vm_fault_t (*page_mkwrite)(struct vm_fault *vmf);
    ...
};

open – called when the VMA is added to the address space.

close – called when the VMA is removed.

fault – invoked on a page‑fault (the page is not present).

page_mkwrite – called when a read‑only page is about to become writable.

Many kernel objects use a similar pattern: struct file_operations for files, struct sock for sockets, etc.

5.5 How VMAs Are Organized

VMAs are linked via vm_next / vm_prev forming an address‑ordered list, and each VMA also participates in the red‑black tree via vm_rb. The head of the list is stored in

mm_struct.mmap</>, while the tree root is <code>mm_struct.mm_rb

3. How ELF Binaries Are Mapped Into Virtual Memory

When a program is executed, the kernel loads the ELF binary and maps its sections into the process’s address space. The core function is load_elf_binary:

static int load_elf_binary(struct linux_binprm *bprm) {
    // set up mmap_base
    setup_new_exec(bprm);
    // create stack VMA
    retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP), executable_stack);
    // map .text, .data, .bss
    error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt, elf_prot, elf_flags, total_size);
    // create heap VMA
    retval = set_brk(elf_bss, elf_brk, bss_prot);
    // map shared libraries
    elf_entry = load_elf_interp(&loc->interp_elf_ex, interpreter, &interp_map_addr, load_bias, interp_elf_phdata);
    // initialize mm_struct fields
    current->mm->end_code = end_code;
    current->mm->start_code = start_code;
    current->mm->start_data = start_data;
    current->mm->end_data = end_data;
    current->mm->start_stack = bprm->p;
    ...
}

Key steps:

Set mmap_base for the process.

Create the stack VMA and set mm->start_stack.

Map the ELF’s code, data, and BSS sections into the appropriate VMAs.

Initialize the heap VMA via set_brk.

Map interpreter (shared libraries) into the memory‑mapped area.

Fill mm_struct fields with the addresses of each region.

4. Kernel Virtual Memory Space

All processes share a common kernel virtual address space. While each process has its own user‑mode space, entering kernel mode gives every process the same view of kernel memory.

Even in kernel mode, the CPU still uses virtual addresses; the kernel simply restricts them to the kernel portion of the address space.

7.1 32‑Bit Kernel Virtual Memory Layout

On 32‑bit systems the kernel occupies the top 1 GB (0xC0000000‑0xFFFFFFFF). This region is divided as follows:

7.1.1 Direct‑Mapped (Linear) Region

The first 896 MB (0xC0000000‑0xC0000000+0x38000000) is a direct mapping of physical memory 0‑0x38000000. An address in this region minus 0xC0000000 yields the corresponding physical address.

This area holds the kernel code, data, BSS, the task_struct of each process, kernel stacks, and other early‑boot structures.

7.1.2 ZONE_HIGHMEM

Physical memory above 896 MB is classified as ZONE_HIGHMEM. Because the kernel virtual space only has 128 MB left after the direct‑mapped region, high memory is accessed via dynamic mappings (e.g., vmalloc or kmap).

7.1.3 vmalloc (Dynamic Mapping) Area

From VMALLOC_START to VMALLOC_END (≈32 TB on 32‑bit) the kernel provides a virtually contiguous region where physically non‑contiguous pages are mapped. This is used for large allocations that cannot fit in the direct‑mapped zone.

7.1.4 Permanent Mapping (pkmap) Area

Between PKMAP_BASE and FIXADDR_START the kernel maintains a small fixed‑size area ( LAST_PKMAP = 1024 entries) for long‑term mappings of high memory pages via kmap.

7.1.5 Fixed Mapping Area

From FIXADDR_START to FIXADDR_TOP the kernel reserves a set of fixed virtual addresses that can be permanently mapped to any physical page. These addresses are known at compile time and are used for early‑boot structures that need a stable virtual address.

7.1.6 Temporary Mapping Area

The highest part of the kernel address space is used for temporary mappings (e.g., kmap_atomic) needed for short‑lived accesses such as copying data from user space into a page cache page.

size_t iov_iter_copy_from_user_atomic(struct page *page, struct iov_iter *i, unsigned long offset, size_t bytes) {
    char *kaddr = kmap_atomic(page), *p = kaddr + offset;
    // copy data from user buffer to page
    ...
    kunmap_atomic(kaddr);
    return bytes;
}

7.2 64‑Bit Kernel Virtual Memory Layout

On 64‑bit systems the kernel virtual space is much larger (128 TB). The layout is simpler:

0xFFFF8000_0000_0000‑0xFFFF8800_0000_0000: 8 TB hole (canonical address hole).

0xFFFF8800_0000_0000‑0xFFFFC800_0000_0000: 64 TB direct‑mapped region (virtual address minus PAGE_OFFSET gives physical address).

0xFFFFC800_0000_0000‑0xFFFFE800_0000_0000: 32 TB vmalloc area.

0xFFFFE800_0000_0000‑0xFFFFEA00_0000_0000: 1 TB vmemmap area for struct page descriptors.

0xFFFFEA00_0000_0000‑0xFFFFFFFF_8000_0000: 512 MB region for the kernel image, modules, and early‑boot data.

Definitions such as PAGE_OFFSET, VMALLOC_START, VMEMMAP_START, and __START_KERNEL_map can be found in the kernel source under /arch/x86/include/asm/.

5. Physical Memory Addresses

Physical memory consists of DRAM chips organized into modules. Each module contains eight DRAM chips (numbered 0‑7). A DRAM chip stores data in a two‑dimensional array of supercells (each supercell = 1 byte). Access is performed by first sending the row address (RAS) to load an entire row into an internal buffer, then sending the column address (CAS) to retrieve the desired byte.

When the CPU needs to read a physical address, it initiates a read transaction on the system bus, which is translated by the I/O bridge to the memory bus. The memory controller uses the physical address to locate the appropriate memory module, broadcasts the row and column addresses to all eight DRAM chips, and collects the eight bytes (one from each chip) to form a 64‑bit word.

CPU accesses memory in word‑size units (8 bytes on a 64‑bit CPU), but the underlying DRAM provides data one byte at a time from each chip.

Writing follows the reverse path: the CPU places the physical address and data on the bus, the memory controller distributes the row/column addresses, and each DRAM chip writes its corresponding byte.

Conclusion

This article traced the journey from virtual addresses to physical memory, covering the motivations for virtual memory, the layout of user‑mode and kernel‑mode address spaces on both 32‑ and 64‑bit Linux, the kernel data structures ( mm_struct, vm_area_struct) that manage these spaces, the process of mapping ELF binaries, and finally the hardware organization of DRAM and CPU memory accesses. Tools such as cat /proc/pid/maps, pmap pid, and cat /proc/iomem can be used to inspect the actual memory layout at runtime.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

memory-management Linux Virtual Memory

Written by

Bin's Tech Cabin

Original articles dissecting source code and sharing personal tech insights. A modest space for serious discussion, free from noise and bureaucracy.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.