Deep Dive into Linux Kernel Page Table Mapping: Theory, Code Analysis & Practice
The article explains the evolution from segment to paging addressing, details Linux’s multi‑level page‑table structures for x86 and ARM, walks through key macros and assembly routines, and provides a complete kernel module example that translates a virtual address to its physical counterpart.
Addressing Mechanisms
The original 8086 processor used real (physical) addresses limited to 64 KB. Intel introduced segmentation to extend the addressable range to 1 MB (20‑bit addresses) and later added paging for protection, resulting in the modern segmented‑plus‑paging scheme managed by the MMU and control registers.
Segment Addressing
In real mode a logical address such as DS:BX is formed by loading a segment selector into a segment register (CS, DS, SS, ES) and an offset into a general register. The hardware concatenates the segment base with the offset to produce a linear address. This mode provides no protection; any process can access the entire address space.
Protected Mode
Segment registers contain a 16‑bit selector rather than a raw base address.
A logical address consists of a selector and a 32‑bit offset. Six selectors exist (CS, SS, DS, ES, FS, GS).
The selector includes a TI bit (GDT vs. LDT) and an RPL field for access rights; a 13‑bit index points to a descriptor in the GDT or LDT.
Descriptor tables are defined by the gdtr and ldtr registers.
During translation the selector indexes the descriptor, the descriptor’s base address is added to the offset, and protection checks are applied.
Paging Addressing
Purpose of Paging
Convert linear addresses to physical addresses.
Divide large memory regions into equal‑sized pages.
First‑Level Page Table (32‑bit example)
With a 4 KB page size, a 4 GB address space yields 1 048 576 pages. The virtual address is split: the high 20 bits index a page‑table entry (PTE), the low 12 bits index within the page. The high 20 bits are multiplied by 4 (size of a PTE) to obtain the byte offset within the page‑table; adding the CR3 base gives the PTE address. The PTE supplies the physical page base, to which the low 12 bits are added to form the final physical address.
Second‑Level Page Table
Page‑directory entries (PDE) store the physical addresses of lower‑level page tables.
Both the page directory and all page tables reside in physical memory.
Each PDE and PTE is 4 bytes; the 32‑bit virtual address is divided into three fields: high 10 bits index the page directory, middle 10 bits index the page table, low 12 bits index within the page.
Enabling Paging (Three Steps)
Prepare the page‑directory and page tables.
Write the page‑table base address into control register CR3.
Set the PG bit in control register CR0.
Paging Management Code Analysis
X86 Architecture (Linux 5.6.4)
Linux 5.6.4 uses a four‑level paging model. Key macros define the bit shifts and entry counts for each level:
#define PGDIR_SHIFT 39
#define PTRS_PER_PGD 512
#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT)
#define PGDIR_MASK (~(PGDIR_SIZE - 1))
#define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
#define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address))Similar definitions exist for PUD_SHIFT, PMD_SHIFT, and PAGE_SHIFT, each with PTRS_PER_* equal to 512 and a page size of 4 KB. Helper functions pud_offset, pmd_offset, and pte_offset_kernel walk down the hierarchy, returning the linear address of the next‑level entry.
ARM Architecture
In ARMv8 the kernel page‑table base resides in TTBR1_EL1, user space in TTBR0_EL0. With a 39‑bit virtual address space, 4 KB pages, and a three‑level table, the address fields are:
Bits [63:39] select kernel vs. user space (choose TTBR).
Bits [38:30] index Level 1 (L1) table.
Bits [29:21] index Level 2 (L2) table.
Bits [20:12] index Level 3 (L3) table.
Bits [11:0] offset within the physical page.
Header files define pgd_t, pud_t, pmd_t, pte_t and conversion macros such as pgd_val(x) and __pgd(x). The compile‑time constant CONFIG_PGTABLE_LEVELS selects the hierarchy:
4 levels → pgd → pud → pmd → pte 3 levels → pgd(pud) → pmd → pte 2 levels →
pgd(pud,pmd) → pteAssembly Macros in head.S
Three macros generate the early‑boot page‑table structures:
.macro create_pgd_entry, tbl, virt, tmp1, tmp2
create_table_entry tbl, virt, PGDIR_SHIFT, PTRS_PER_PGD, tmp1, tmp2
#if SWAPPER_PGTABLE_LEVELS > 3
create_table_entry tbl, virt, PUD_SHIFT, PTRS_PER_PUD, tmp1, tmp2
#endif
#if SWAPPER_PGTABLE_LEVELS > 2
create_table_entry tbl, virt, SWAPPER_TABLE_SHIFT, PTRS_PER_PTE, tmp1, tmp2
#endif
.endm
.macro create_block_map, tbl, flags, phys, start, end
lsr phys, phys, #SWAPPER_BLOCK_SHIFT
lsr start, start, #SWAPPER_BLOCK_SHIFT
and start, start, #PTRS_PER_PTE - 1
orr phys, flags, phys, lsl #SWAPPER_BLOCK_SHIFT
lsr end, end, #SWAPPER_BLOCK_SHIFT
and end, end, #PTRS_PER_PTE - 1
9999: str phys, [tbl, start, lsl #3]
add start, start, #1
add phys, phys, #SWAPPER_BLOCK_SIZE
cmp start, end
b.ls 9999b
.endm
.macro create_table_entry, tbl, virt, shift, ptrs, tmp1, tmp2
lsr tmp1, virt, #shift
and tmp1, tmp1, #ptrs - 1
add tmp2, tbl, #PAGE_SIZE
orr tmp2, tmp2, #PMD_TYPE_TABLE
str tmp2, [tbl, tmp1, lsl #3]
add tbl, tbl, #PAGE_SIZE
.endmWhen SWAPPER_PGTABLE_LEVELS=3 only the PGD and PMD levels are created; the macro increments tbl by PAGE_SIZE after each entry, making the two tables physically contiguous.
Hands‑On Practice
A user‑space test program allocates memory and prints its virtual address:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
char *p = NULL;
p = malloc(10);
printf("address = 0x%x
", p);
while (1);
return 0;
}A kernel module receives a PID and a virtual address (parameters pid and va) and walks the page tables to obtain the corresponding physical address. It prints the values of paging‑related macros, traverses PGD → PUD → PMD → PTE, checks for presence, and computes the physical address with PAGE_MASK:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/pid.h>
#include <linux/mm.h>
#include <asm/pgtable.h>
#include <asm/page.h>
static int pid;
static unsigned long va;
module_param(pid, int, 0644);
module_param(va, ulong, 0644);
static int find_pgd_init(void) {
unsigned long pa = 0;
struct task_struct *pcb_tmp = NULL;
pgd_t *pgd_tmp = NULL;
pud_t *pud_tmp = NULL;
pmd_t *pmd_tmp = NULL;
pte_t *pte_tmp = NULL;
printk(KERN_INFO "PAGE_OFFSET = 0x%lx
", PAGE_OFFSET);
printk(KERN_INFO "PGDIR_SHIFT = %d
", PGDIR_SHIFT);
printk(KERN_INFO "PUD_SHIFT = %d
", PUD_SHIFT);
printk(KERN_INFO "PMD_SHIFT = %d
", PMD_SHIFT);
printk(KERN_INFO "PAGE_SHIFT = %d
", PAGE_SHIFT);
printk(KERN_INFO "PTRS_PER_PGD = %d
", PTRS_PER_PGD);
printk(KERN_INFO "PTRS_PER_PUD = %d
", PTRS_PER_PUD);
printk(KERN_INFO "PTRS_PER_PMD = %d
", PTRS_PER_PMD);
printk(KERN_INFO "PTRS_PER_PTE = %d
", PTRS_PER_PTE);
printk(KERN_INFO "PAGE_MASK = 0x%lx
", PAGE_MASK);
struct pid *p = find_vpid(pid);
pcb_tmp = pid_task(p, PIDTYPE_PID);
printk(KERN_INFO "pgd = 0x%p
", pcb_tmp->mm->pgd);
if (!find_vma(pcb_tmp->mm, va)) {
printk(KERN_INFO "virt_addr 0x%lx not available.
", va);
return 0;
}
pgd_tmp = pgd_offset(pcb_tmp->mm, va);
if (pgd_none(*pgd_tmp)) return 0;
pud_tmp = pud_offset(pgd_tmp, va);
if (pud_none(*pud_tmp)) return 0;
pmd_tmp = pmd_offset(pud_tmp, va);
if (pmd_none(*pmd_tmp)) return 0;
pte_tmp = pte_offset_kernel(pmd_tmp, va);
if (pte_none(*pte_tmp) || !pte_present(*pte_tmp)) return 0;
pa = pte_val(*pte_tmp) & PAGE_MASK;
printk(KERN_INFO "virt_addr 0x%lx in RAM page is 0x%lx.
", va, pa);
return 0;
}
static void __exit find_pgd_exit(void) {
printk(KERN_INFO "Goodbye!
");
}
module_init(find_pgd_init);
module_exit(find_pgd_exit);
MODULE_LICENSE("GPL");The accompanying Makefile builds the module. Loading it with: insmod lab3.ko pid=2630 va=0xa87010 and inspecting dmesg shows the macro values and the resolved physical address, confirming the translation process.
Linux Code Review Hub
A professional Linux technology community and learning platform covering the kernel, memory management, process management, file system and I/O, performance tuning, device drivers, virtualization, and cloud computing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
