Fundamentals 24 min read

How Linux Kernel Maps Virtual to Physical Memory: Deep Dive into Page Table Paging Mechanism

This article traces the evolution from real‑mode segmentation to modern paging, explains segment and page‑table structures on x86 and ARM, dissects kernel macros such as PGDIR_SHIFT and PUD_SHIFT, and demonstrates a kernel module that translates a process's virtual address to its physical address.

Linux Kernel Journey
Linux Kernel Journey
Linux Kernel Journey
How Linux Kernel Maps Virtual to Physical Memory: Deep Dive into Page Table Paging Mechanism

The addressing mechanism in early Intel CPUs started with real‑mode 20‑bit addresses, then added segmentation to extend the range to 1 MiB, and finally introduced paging to protect memory, giving rise to the MMU, control registers (CR0, CR3, etc.) and a combined segment‑plus‑paging system used today.

1. Segmentation

In real mode, a logical address is formed by a segment register (CS, DS, SS, ES) and an offset (e.g., DS:BX). The segment register holds a base address, and the offset selects a location within the 64 KiB segment. This logical address is converted to a linear address by the hardware.

Protected mode replaces the segment base with a segment selector . The selector contains a TI bit (GDT or LDT), an RPL field for privilege, and a 13‑bit index that points to a descriptor in the GDT/LDT. The descriptor provides the segment base, limit and access rights, allowing the CPU to enforce protection.

Six segment registers exist: CS, DS, SS, ES, FS, GS.

Descriptor tables: GDT and LDT, each defined by GDTR and LDTR registers.

Address translation: selector → descriptor → base → add offset → linear address.

2. Paging

Paging converts a linear address to a physical address and splits large memory regions into equal‑sized pages (typically 4 KiB). For a 32‑bit address space, the address is divided into a 20‑bit page index and a 12‑bit offset.

2.1 Level‑1 Page Table

The high 20 bits index a page‑directory entry (PDE). Each PDE occupies 4 bytes; the index multiplied by 4 gives the byte offset within the page directory whose physical address is stored in CR3. Adding the low 12‑bit offset yields the final physical address.

2.2 Level‑2 Page Table

Each page‑directory entry points to a page table; both PDE and PTE are 4 bytes.

The 32‑bit virtual address is split into high 10 bits (page‑directory index), middle 10 bits (page‑table index) and low 12 bits (offset).

Physical address = (page‑table entry & PAGE_MASK) + offset.

2.3 Enabling Paging (Three Steps)

Prepare a page‑directory and the required page tables.

Write the physical address of the page directory into control register CR3.

Set the PG bit (bit 31) in control register CR0.

3. x86 Paging Code Analysis (Linux 5.6.4)

3.1 PGDIR_SHIFT and Related Macros

#define PGDIR_SHIFT 39
#define PTRS_PER_PGD 512
#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT)
#define PGDIR_MASK (~(PGDIR_SIZE - 1))

The macro pgd_offset(mm, address) returns the linear address of the page‑global‑directory entry that maps address:

#define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
#define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address))

3.2 PUD_SHIFT and Related Macros

#define PUD_SHIFT 30
#define PTRS_PER_PUD 512
#define PUD_SIZE (_AC(1, UL) << PUD_SHIFT)
#define PUD_MASK (~(PUD_SIZE - 1))
pud_offset(dir, addr)

yields the linear address of the page‑upper‑directory entry:

#define pud_offset(dir,addr) \
    ((pud_t *) pgd_page_vaddr(*(dir)) + (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1)))

3.3 PMD_SHIFT and Related Macros

#define PMD_SHIFT 21
#define PTRS_PER_PMD 512
#define PMD_SIZE (_AC(1, UL) << PMD_SHIFT)
#define PMD_MASK (~(PMD_SIZE - 1))
pmd_offset(dir, addr)

returns the linear address of the page‑middle‑directory entry.

3.4 PAGE_SHIFT and Related Macros

#define PAGE_SHIFT 12
#define PAGE_SIZE (1UL << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE - 1))
#define PTRS_PER_PTE 512

4. ARM64 Paging Analysis

In ARMv8, the kernel page‑table base resides in TTBR1_EL1 and the user page‑table base in TTBR0_EL0. Kernel addresses have the high bits set to all 1 (0xFFFF0000_00000000 – 0xFFFFFFFF_FFFFFFFF), while user addresses have high bits set to all 0 (0x00000000_00000000 – 0x0000FFFF_FFFFFFFF).

Assuming a 39‑bit virtual address space, 4 KiB pages and a 3‑level page table, the translation proceeds as follows:

Bits [63:39] select the TTBR (kernel vs. user).

Bits [38:30] index the Level‑1 table (PGD).

Bits [29:21] index the Level‑2 table (PUD).

Bits [20:12] index the Level‑3 table (PMD).

Bits [11:0] are the offset within the 4 KiB physical page.

Relevant header files define the types and macros used for these levels:

#include <asm/pgtable-types.h>   // defines pgd_t, pud_t, pmd_t, pte_t
#include <asm/pgtable-prot.h>   // permission bits for entries
#include <asm/pgtable-hwdef.h> // hardware‑specific layout, TCR settings
#include <asm/pgtable.h>       // high‑level page‑table helpers

Configuration CONFIG_PGTABLE_LEVELS controls which levels are present: CONFIG_PGTABLE_LEVELS=4: pgd → pud → pmd → pte. CONFIG_PGTABLE_LEVELS=3: pgd(pud) → pmd → pte (no separate pud). CONFIG_PGTABLE_LEVELS=2: pgd(pud,pmd) → pte (no pud or pmd).

4.1 Macros in head.S

The assembly file arch/arm64/kernel/head.S creates page‑table entries using three macros:

/* Macro to populate the PGD (and possibly PUD) for a block entry */
.macro create_pgd_entry, tbl, virt, tmp1, tmp2
    create_table_entry \tbl, \virt, PGDIR_SHIFT, PTRS_PER_PGD, \tmp1, \tmp2
#if SWAPPER_PGTABLE_LEVELS > 3
    create_table_entry \tbl, \virt, PUD_SHIFT, PTRS_PER_PUD, \tmp1, \tmp2
#endif
#if SWAPPER_PGTABLE_LEVELS > 2
    create_table_entry \tbl, \virt, SWAPPER_TABLE_SHIFT, PTRS_PER_PTE, \tmp1, \tmp2
#endif
.endm

/* Macro to populate block entries in the page table for a virtual range */
.macro create_block_map, tbl, flags, phys, start, end
    lsr \phys, \phys, #SWAPPER_BLOCK_SHIFT
    lsr \start, \start, #SWAPPER_BLOCK_SHIFT
    and \start, \start, #PTRS_PER_PTE - 1   // table index
    orr \phys, \flags, \phys, lsl #SWAPPER_BLOCK_SHIFT // table entry
    lsr \end, \end, #SWAPPER_BLOCK_SHIFT
    and \end, \end, #PTRS_PER_PTE - 1   // table end index
9999: str \phys, [\tbl, \start, lsl #3] // store the entry
    add \start, \start, #1   // next entry
    add \phys, \phys, #SWAPPER_BLOCK_SIZE // next block
    cmp \start, \end
    b.ls 9999b
.endm

/* Macro to create a table entry to the next level */
.macro create_table_entry, tbl, virt, shift, ptrs, tmp1, tmp2
    lsr \tmp1, \virt, #\shift
    and \tmp1, \tmp1, #\ptrs - 1   // table index
    add \tmp2, \tbl, #PAGE_SIZE
    orr \tmp2, \tmp2, #PMD_TYPE_TABLE // address of next table and entry type
    str \tmp2, [\tbl, \tmp1, lsl #3]
    add \tbl, \tbl, #PAGE_SIZE  // next level table page
.endm

These macros build a two‑level (pgd and pmd) page table that is physically contiguous because tbl is incremented by PAGE_SIZE after each entry.

5. Hands‑On Practice

A simple user‑space program prints its allocated pointer address:

#include <stdio.h>
#include <stdlib.h>
int main(void) {
    char *p = NULL;
    p = malloc(10);
    printf("address = 0x%x
", p);
    while (1);
    return 0;
}

The following kernel module accepts a PID and a virtual address, walks the page tables, and prints the corresponding physical address:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/pid.h>
#include <linux/mm.h>
#include <asm/pgtable.h>
#include <asm/page.h>

MODULE_AUTHOR("wang.com");
MODULE_DESCRIPTION("Virtual address to physical address translation");
MODULE_LICENSE("GPL");

static int pid;
static unsigned long va;
module_param(pid, int, 0644);
module_param(va, ulong, 0644);

static int __init find_pgd_init(void) {
    unsigned long pa = 0;
    struct task_struct *pcb_tmp = NULL;
    pgd_t *pgd_tmp = NULL;
    pud_t *pud_tmp = NULL;
    pmd_t *pmd_tmp = NULL;
    pte_t *pte_tmp = NULL;

    printk(KERN_INFO "PAGE_OFFSET = 0x%lx
", PAGE_OFFSET);
    printk(KERN_INFO "PGDIR_SHIFT = %d
", PGDIR_SHIFT);
    printk(KERN_INFO "PUD_SHIFT = %d
", PUD_SHIFT);
    printk(KERN_INFO "PMD_SHIFT = %d
", PMD_SHIFT);
    printk(KERN_INFO "PAGE_SHIFT = %d
", PAGE_SHIFT);
    printk(KERN_INFO "PTRS_PER_PGD = %d
", PTRS_PER_PGD);
    printk(KERN_INFO "PTRS_PER_PUD = %d
", PTRS_PER_PUD);
    printk(KERN_INFO "PTRS_PER_PMD = %d
", PTRS_PER_PMD);
    printk(KERN_INFO "PTRS_PER_PTE = %d
", PTRS_PER_PTE);
    printk(KERN_INFO "PAGE_MASK = 0x%lx
", PAGE_MASK);

    struct pid *p = find_vpid(pid);
    pcb_tmp = pid_task(p, PIDTYPE_PID);
    printk(KERN_INFO "pgd = %p
", pcb_tmp->mm->pgd);
    if (!find_vma(pcb_tmp->mm, va)) {
        printk(KERN_INFO "virt_addr 0x%lx not available.
", va);
        return 0;
    }
    pgd_tmp = pgd_offset(pcb_tmp->mm, va);
    printk(KERN_INFO "pgd_tmp = %p
", pgd_tmp);
    printk(KERN_INFO "pgd_val(*pgd_tmp) = 0x%lx
", pgd_val(*pgd_tmp));
    if (pgd_none(*pgd_tmp)) {
        printk(KERN_INFO "Not mapped in pgd.
");
        return 0;
    }
    pud_tmp = pud_offset(pgd_tmp, va);
    printk(KERN_INFO "pud_tmp = %p
", pud_tmp);
    printk(KERN_INFO "pud_val(*pud_tmp) = 0x%lx
", pud_val(*pud_tmp));
    if (pud_none(*pud_tmp)) {
        printk(KERN_INFO "Not mapped in pud.
");
        return 0;
    }
    pmd_tmp = pmd_offset(pud_tmp, va);
    printk(KERN_INFO "pmd_tmp = %p
", pmd_tmp);
    printk(KERN_INFO "pmd_val(*pmd_tmp) = 0x%lx
", pmd_val(*pmd_tmp));
    if (pmd_none(*pmd_tmp)) {
        printk(KERN_INFO "Not mapped in pmd.
");
        return 0;
    }
    pte_tmp = pte_offset_kernel(pmd_tmp, va);
    printk(KERN_INFO "pte_tmp = %p
", pte_tmp);
    printk(KERN_INFO "pte_val(*pte_tmp) = 0x%lx
", pte_val(*pte_tmp));
    if (pte_none(*pte_tmp)) {
        printk(KERN_INFO "Not mapped in pte.
");
        return 0;
    }
    if (!pte_present(*pte_tmp)) {
        printk(KERN_INFO "pte not in RAM.
");
        return 0;
    }
    pa = (pte_val(*pte_tmp) & PAGE_MASK);
    printk(KERN_INFO "virt_addr 0x%lx in RAM page is 0x%lx.
", va, pa);
    return 0;
}

static void __exit find_pgd_exit(void) {
    printk(KERN_INFO "Goodbye!
");
}

module_init(find_pgd_init);
module_exit(find_pgd_exit);

The accompanying Makefile builds the module against the running kernel:

# If KERNELRELEASE is defined, we've been invoked from the kernel build system
ifneq ($(KERNELRELEASE),)
    obj-m := lab3.o
else
    KERNELDIR ?= /lib/modules/$(shell uname -r)/build
    PWD := $(shell pwd)
    default:
        $(MAKE) -C $(KERNELDIR) M=$(PWD) modules
endif

clean:
    rm -rf *.o *~ core .depend .*.cmd *.ko *.mod.c .tmp_versions *.order *.symvers *.unsigned

To test, compile the module, insert it with insmod lab3.ko pid=2630 va=0xa87010, and examine the kernel log via dmesg. The log shows the macro values, each level’s entry address, and the final physical address corresponding to the supplied virtual address.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxPage TableVirtual Memoryx86ARM64Kernel ModulePaging
Linux Kernel Journey
Written by

Linux Kernel Journey

Linux Kernel Journey

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.