How Linux Kernel Maps Virtual to Physical Memory: Deep Dive into Page Table Paging Mechanism
This article traces the evolution from real‑mode segmentation to modern paging, explains segment and page‑table structures on x86 and ARM, dissects kernel macros such as PGDIR_SHIFT and PUD_SHIFT, and demonstrates a kernel module that translates a process's virtual address to its physical address.
The addressing mechanism in early Intel CPUs started with real‑mode 20‑bit addresses, then added segmentation to extend the range to 1 MiB, and finally introduced paging to protect memory, giving rise to the MMU, control registers (CR0, CR3, etc.) and a combined segment‑plus‑paging system used today.
1. Segmentation
In real mode, a logical address is formed by a segment register (CS, DS, SS, ES) and an offset (e.g., DS:BX). The segment register holds a base address, and the offset selects a location within the 64 KiB segment. This logical address is converted to a linear address by the hardware.
Protected mode replaces the segment base with a segment selector . The selector contains a TI bit (GDT or LDT), an RPL field for privilege, and a 13‑bit index that points to a descriptor in the GDT/LDT. The descriptor provides the segment base, limit and access rights, allowing the CPU to enforce protection.
Six segment registers exist: CS, DS, SS, ES, FS, GS.
Descriptor tables: GDT and LDT, each defined by GDTR and LDTR registers.
Address translation: selector → descriptor → base → add offset → linear address.
2. Paging
Paging converts a linear address to a physical address and splits large memory regions into equal‑sized pages (typically 4 KiB). For a 32‑bit address space, the address is divided into a 20‑bit page index and a 12‑bit offset.
2.1 Level‑1 Page Table
The high 20 bits index a page‑directory entry (PDE). Each PDE occupies 4 bytes; the index multiplied by 4 gives the byte offset within the page directory whose physical address is stored in CR3. Adding the low 12‑bit offset yields the final physical address.
2.2 Level‑2 Page Table
Each page‑directory entry points to a page table; both PDE and PTE are 4 bytes.
The 32‑bit virtual address is split into high 10 bits (page‑directory index), middle 10 bits (page‑table index) and low 12 bits (offset).
Physical address = (page‑table entry & PAGE_MASK) + offset.
2.3 Enabling Paging (Three Steps)
Prepare a page‑directory and the required page tables.
Write the physical address of the page directory into control register CR3.
Set the PG bit (bit 31) in control register CR0.
3. x86 Paging Code Analysis (Linux 5.6.4)
3.1 PGDIR_SHIFT and Related Macros
#define PGDIR_SHIFT 39
#define PTRS_PER_PGD 512
#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT)
#define PGDIR_MASK (~(PGDIR_SIZE - 1))The macro pgd_offset(mm, address) returns the linear address of the page‑global‑directory entry that maps address:
#define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
#define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address))3.2 PUD_SHIFT and Related Macros
#define PUD_SHIFT 30
#define PTRS_PER_PUD 512
#define PUD_SIZE (_AC(1, UL) << PUD_SHIFT)
#define PUD_MASK (~(PUD_SIZE - 1)) pud_offset(dir, addr)yields the linear address of the page‑upper‑directory entry:
#define pud_offset(dir,addr) \
((pud_t *) pgd_page_vaddr(*(dir)) + (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1)))3.3 PMD_SHIFT and Related Macros
#define PMD_SHIFT 21
#define PTRS_PER_PMD 512
#define PMD_SIZE (_AC(1, UL) << PMD_SHIFT)
#define PMD_MASK (~(PMD_SIZE - 1)) pmd_offset(dir, addr)returns the linear address of the page‑middle‑directory entry.
3.4 PAGE_SHIFT and Related Macros
#define PAGE_SHIFT 12
#define PAGE_SIZE (1UL << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE - 1))
#define PTRS_PER_PTE 5124. ARM64 Paging Analysis
In ARMv8, the kernel page‑table base resides in TTBR1_EL1 and the user page‑table base in TTBR0_EL0. Kernel addresses have the high bits set to all 1 (0xFFFF0000_00000000 – 0xFFFFFFFF_FFFFFFFF), while user addresses have high bits set to all 0 (0x00000000_00000000 – 0x0000FFFF_FFFFFFFF).
Assuming a 39‑bit virtual address space, 4 KiB pages and a 3‑level page table, the translation proceeds as follows:
Bits [63:39] select the TTBR (kernel vs. user).
Bits [38:30] index the Level‑1 table (PGD).
Bits [29:21] index the Level‑2 table (PUD).
Bits [20:12] index the Level‑3 table (PMD).
Bits [11:0] are the offset within the 4 KiB physical page.
Relevant header files define the types and macros used for these levels:
#include <asm/pgtable-types.h> // defines pgd_t, pud_t, pmd_t, pte_t
#include <asm/pgtable-prot.h> // permission bits for entries
#include <asm/pgtable-hwdef.h> // hardware‑specific layout, TCR settings
#include <asm/pgtable.h> // high‑level page‑table helpersConfiguration CONFIG_PGTABLE_LEVELS controls which levels are present: CONFIG_PGTABLE_LEVELS=4: pgd → pud → pmd → pte. CONFIG_PGTABLE_LEVELS=3: pgd(pud) → pmd → pte (no separate pud). CONFIG_PGTABLE_LEVELS=2: pgd(pud,pmd) → pte (no pud or pmd).
4.1 Macros in head.S
The assembly file arch/arm64/kernel/head.S creates page‑table entries using three macros:
/* Macro to populate the PGD (and possibly PUD) for a block entry */
.macro create_pgd_entry, tbl, virt, tmp1, tmp2
create_table_entry \tbl, \virt, PGDIR_SHIFT, PTRS_PER_PGD, \tmp1, \tmp2
#if SWAPPER_PGTABLE_LEVELS > 3
create_table_entry \tbl, \virt, PUD_SHIFT, PTRS_PER_PUD, \tmp1, \tmp2
#endif
#if SWAPPER_PGTABLE_LEVELS > 2
create_table_entry \tbl, \virt, SWAPPER_TABLE_SHIFT, PTRS_PER_PTE, \tmp1, \tmp2
#endif
.endm
/* Macro to populate block entries in the page table for a virtual range */
.macro create_block_map, tbl, flags, phys, start, end
lsr \phys, \phys, #SWAPPER_BLOCK_SHIFT
lsr \start, \start, #SWAPPER_BLOCK_SHIFT
and \start, \start, #PTRS_PER_PTE - 1 // table index
orr \phys, \flags, \phys, lsl #SWAPPER_BLOCK_SHIFT // table entry
lsr \end, \end, #SWAPPER_BLOCK_SHIFT
and \end, \end, #PTRS_PER_PTE - 1 // table end index
9999: str \phys, [\tbl, \start, lsl #3] // store the entry
add \start, \start, #1 // next entry
add \phys, \phys, #SWAPPER_BLOCK_SIZE // next block
cmp \start, \end
b.ls 9999b
.endm
/* Macro to create a table entry to the next level */
.macro create_table_entry, tbl, virt, shift, ptrs, tmp1, tmp2
lsr \tmp1, \virt, #\shift
and \tmp1, \tmp1, #\ptrs - 1 // table index
add \tmp2, \tbl, #PAGE_SIZE
orr \tmp2, \tmp2, #PMD_TYPE_TABLE // address of next table and entry type
str \tmp2, [\tbl, \tmp1, lsl #3]
add \tbl, \tbl, #PAGE_SIZE // next level table page
.endmThese macros build a two‑level (pgd and pmd) page table that is physically contiguous because tbl is incremented by PAGE_SIZE after each entry.
5. Hands‑On Practice
A simple user‑space program prints its allocated pointer address:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
char *p = NULL;
p = malloc(10);
printf("address = 0x%x
", p);
while (1);
return 0;
}The following kernel module accepts a PID and a virtual address, walks the page tables, and prints the corresponding physical address:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/pid.h>
#include <linux/mm.h>
#include <asm/pgtable.h>
#include <asm/page.h>
MODULE_AUTHOR("wang.com");
MODULE_DESCRIPTION("Virtual address to physical address translation");
MODULE_LICENSE("GPL");
static int pid;
static unsigned long va;
module_param(pid, int, 0644);
module_param(va, ulong, 0644);
static int __init find_pgd_init(void) {
unsigned long pa = 0;
struct task_struct *pcb_tmp = NULL;
pgd_t *pgd_tmp = NULL;
pud_t *pud_tmp = NULL;
pmd_t *pmd_tmp = NULL;
pte_t *pte_tmp = NULL;
printk(KERN_INFO "PAGE_OFFSET = 0x%lx
", PAGE_OFFSET);
printk(KERN_INFO "PGDIR_SHIFT = %d
", PGDIR_SHIFT);
printk(KERN_INFO "PUD_SHIFT = %d
", PUD_SHIFT);
printk(KERN_INFO "PMD_SHIFT = %d
", PMD_SHIFT);
printk(KERN_INFO "PAGE_SHIFT = %d
", PAGE_SHIFT);
printk(KERN_INFO "PTRS_PER_PGD = %d
", PTRS_PER_PGD);
printk(KERN_INFO "PTRS_PER_PUD = %d
", PTRS_PER_PUD);
printk(KERN_INFO "PTRS_PER_PMD = %d
", PTRS_PER_PMD);
printk(KERN_INFO "PTRS_PER_PTE = %d
", PTRS_PER_PTE);
printk(KERN_INFO "PAGE_MASK = 0x%lx
", PAGE_MASK);
struct pid *p = find_vpid(pid);
pcb_tmp = pid_task(p, PIDTYPE_PID);
printk(KERN_INFO "pgd = %p
", pcb_tmp->mm->pgd);
if (!find_vma(pcb_tmp->mm, va)) {
printk(KERN_INFO "virt_addr 0x%lx not available.
", va);
return 0;
}
pgd_tmp = pgd_offset(pcb_tmp->mm, va);
printk(KERN_INFO "pgd_tmp = %p
", pgd_tmp);
printk(KERN_INFO "pgd_val(*pgd_tmp) = 0x%lx
", pgd_val(*pgd_tmp));
if (pgd_none(*pgd_tmp)) {
printk(KERN_INFO "Not mapped in pgd.
");
return 0;
}
pud_tmp = pud_offset(pgd_tmp, va);
printk(KERN_INFO "pud_tmp = %p
", pud_tmp);
printk(KERN_INFO "pud_val(*pud_tmp) = 0x%lx
", pud_val(*pud_tmp));
if (pud_none(*pud_tmp)) {
printk(KERN_INFO "Not mapped in pud.
");
return 0;
}
pmd_tmp = pmd_offset(pud_tmp, va);
printk(KERN_INFO "pmd_tmp = %p
", pmd_tmp);
printk(KERN_INFO "pmd_val(*pmd_tmp) = 0x%lx
", pmd_val(*pmd_tmp));
if (pmd_none(*pmd_tmp)) {
printk(KERN_INFO "Not mapped in pmd.
");
return 0;
}
pte_tmp = pte_offset_kernel(pmd_tmp, va);
printk(KERN_INFO "pte_tmp = %p
", pte_tmp);
printk(KERN_INFO "pte_val(*pte_tmp) = 0x%lx
", pte_val(*pte_tmp));
if (pte_none(*pte_tmp)) {
printk(KERN_INFO "Not mapped in pte.
");
return 0;
}
if (!pte_present(*pte_tmp)) {
printk(KERN_INFO "pte not in RAM.
");
return 0;
}
pa = (pte_val(*pte_tmp) & PAGE_MASK);
printk(KERN_INFO "virt_addr 0x%lx in RAM page is 0x%lx.
", va, pa);
return 0;
}
static void __exit find_pgd_exit(void) {
printk(KERN_INFO "Goodbye!
");
}
module_init(find_pgd_init);
module_exit(find_pgd_exit);The accompanying Makefile builds the module against the running kernel:
# If KERNELRELEASE is defined, we've been invoked from the kernel build system
ifneq ($(KERNELRELEASE),)
obj-m := lab3.o
else
KERNELDIR ?= /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
default:
$(MAKE) -C $(KERNELDIR) M=$(PWD) modules
endif
clean:
rm -rf *.o *~ core .depend .*.cmd *.ko *.mod.c .tmp_versions *.order *.symvers *.unsignedTo test, compile the module, insert it with insmod lab3.ko pid=2630 va=0xa87010, and examine the kernel log via dmesg. The log shows the macro values, each level’s entry address, and the final physical address corresponding to the supplied virtual address.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
