Efficient Memory Management: Paging Mechanism in x86‑64 Architecture
This article explains how the x86‑64 paging mechanism maps virtual addresses to physical memory, detailing the required protection‑mode conditions, page‑table structures, control‑register settings, various paging modes (32‑bit, PAE, 4‑level, 5‑level), and provides concrete kernel code examples and address‑translation demonstrations.
1. Role of Memory Paging
In the x86‑64 architecture, paging splits linear (virtual) addresses into fixed‑size pages and maps each page to a physical address. Paging only works when the CPU is in protected mode (CR0.PE = 1). When enabled, applications see a linear address space that is independent of actual physical locations, similar to a virtual‑reality game where the player perceives virtual positions without knowing the real hardware addresses.
Advantages of Paging
Virtualization: each process gets an isolated address space, improving security and isolation.
Memory sharing: multiple processes can map the same physical page, saving RAM.
Lazy loading: pages are loaded only when accessed, reducing initialization time and memory usage.
Memory protection: pages can be marked read‑only or non‑executable to protect code and data.
2. Enabling and Disabling Paging
Paging is controlled by the PG bit (bit 31) of the CR0 register. When CR0.PG = 0, paging is disabled and linear addresses equal physical addresses. When CR0.PG = 1, paging is active and linear addresses must be translated through paging structures.
Prepare page directory and page tables.
Write the high 20 bits of the page‑directory physical address into CR3 (bits 31‑12).
Set CR0.PG to 1.
Boot Code Example (loader.s)
; os/src/boot/loader.s
; 下面就是保护模式下的程序了
[bits 32]
p_mode_start:
mov ax, SELECTOR_DATA
mov ds, ax
mov es, ax
mov ss, ax
mov esp,LOADER_STACK_TOP
mov ax, SELECTOR_VIDEO
mov gs, ax
mov byte [gs:320], 'M'
mov byte [gs:322], 'A'
mov byte [gs:324], 'I'
mov byte [gs:326], 'N'
call setup_page ; 创建页目录及页表并初始化页内存位图
sgdt [gdt_ptr] ; 保存原 GDT
mov ebx, [gdt_ptr + 2]
or dword [ebx + 0x18 + 4], 0xc0000000 ; 视频段基址 + 0xc0000000
add dword [gdt_ptr + 2], 0xc0000000 ; GDT 基址映射到高地址
add esp, 0xc0000000 ; 栈指针映射到高地址
mov eax, PAGE_DIR_TABLE_POS
mov cr3, eax ; 把页目录地址写入 CR3
mov eax, cr0
or eax, 0x80000000 ; 打开 PG 位
mov cr0, eax
lgdt [gdt_ptr] ; 重新加载 GDT
mov byte [gs:320], 'V'
mov byte [gs:322], 'i'
mov byte [gs:324], 'r'
mov byte [gs:326], 't'
mov byte [gs:328], 'u'
mov byte [gs:330], 'a'
mov byte [gs:332], 'l'
jmp $
setup_page:
mov ecx, 4096
mov esi, 0
.clear_page_dir:
mov byte [PAGE_DIR_TABLE_POS + esi], 0
inc esi
loop .clear_page_dir
; 创建页目录(第 0 项和 0xc00 项指向同一个页表)
mov eax, PAGE_DIR_TABLE_POS
add eax, 0x1000
mov ebx, eax
or eax, PG_US_U | PG_RW_W | PG_P
mov [PAGE_DIR_TABLE_POS + 0x0], eax
mov [PAGE_DIR_TABLE_POS + 0xc00], eax
sub eax, 0x1000
mov [PAGE_DIR_TABLE_POS + 4092], eax
; 创建第一个页表(256 项)
mov ecx, 256
mov esi, 0
mov edx, PG_US_U | PG_RW_W | PG_P
.create_pte:
mov [ebx+esi*4], edx
add edx, 4096
inc esi
loop .create_pte
; 创建内核其余页目录项
mov eax, PAGE_DIR_TABLE_POS
add eax, 0x2000
or eax, PG_US_U | PG_RW_W | PG_P
mov ebx, PAGE_DIR_TABLE_POS
mov ecx, 254
mov esi, 769
.create_kernel_pde:
mov [ebx+esi*4], eax
inc esi
add eax, 0x1000
loop .create_kernel_pde
retMacro Definitions (boot.inc)
PAGE_DIR_TABLE_POS equ 0x100000
PG_P equ 1b
PG_RW_R equ 00b
PG_RW_W equ 10b
PG_US_S equ 000b
PG_US_U equ 100b3. Paging Modes Supported by Intel‑64
32‑bit paging: CR4.PAE = 0
PAE paging: CR4.PAE = 1, IA32_EFER.LME = 0
4‑level paging: CR4.PAE = 1, IA32_EFER.LME = 1, CR4.LA57 = 0
5‑level paging: CR4.PAE = 1, IA32_EFER.LME = 1, CR4.LA57 = 1
The current mode is determined by the combination of CR4.PAE, CR4.LA57 and IA32_EFER.LME.
3.1 32‑bit Paging
Enabled after setting CR0.PG. It uses a three‑level translation (CR3 → PDE → PTE → physical page). Linear addresses are limited to 4 GiB, and each page table entry is 4 bytes.
Page‑table entry (PTE) format (bits):
P (Present) – 1 if the page is in RAM.
RW – 1 for read/write, 0 for read‑only.
US – 1 if user‑mode can access.
PCD – cache‑disable.
Dirty – set by CPU on write.
… (other metadata in the low 12 bits).
The low 12 bits of a PTE are always zero for the physical address because pages are 4 KB aligned; those bits store the metadata instead.
3.2 PAE Paging
Physical‑Address Extension expands the physical address width to up to 52 bits. Page‑table entries become 8 bytes, and an extra level (PDPTE) is introduced. CR3 now points to a Page‑Directory‑Pointer Table (PDPT) that holds four PDPTEs.
Translation sequence: CR3 → PDPTE → PDE → PTE → physical page. The extra level does not add extra memory accesses during translation because the PDPTEs are cached in dedicated registers.
3.3 4‑Level Paging
Used in long‑mode (64‑bit) CPUs. Adds a top‑level PML4 table. Translation chain: CR3 → PML4E → PDPTE → PDE → PTE → physical page. Supports 48‑bit linear addresses and up to 52‑bit physical addresses. Each table entry is 8 bytes, and each table contains 512 entries.
3.4 5‑Level Paging
Activated when CR4.LA57 = 1. Extends linear addresses to 57 bits, useful for massive memory workloads (e.g., large cloud platforms). The hierarchy becomes: CR3 → PML5E → PML4E → PDPTE → PDE → PTE → physical page.
4. Detailed Page‑Structure Layout
All paging modes use hierarchical tables of 4 KB pages. In 32‑bit mode each entry is 4 bytes (1024 entries per table). In 4‑/5‑level modes each entry is 8 bytes (512 entries per table). PAE uses a special 32‑byte structure containing four 8‑byte entries.
The linear address is split into a high part (page‑number) that selects entries in the hierarchy and a low part (page‑offset) that indexes within the final physical page.
4.1 Translation Process
Starting from the address stored in CR3, the CPU extracts the appropriate bits of the linear address to index the current table. If the entry points to another table, the process repeats; if the entry maps a page, the physical address is formed by concatenating the entry’s address bits with the page‑offset.
4‑level example (48‑bit address): bits 47‑39 → PML4 index, bits 38‑30 → PDPTE index, bits 29‑21 → PDE index, bits 20‑12 → PTE index, bits 11‑0 → offset.
5‑level adds an extra 9‑bit index at bits 56‑48.
4.2 Exceptions
If a required entry has the Present (P) bit cleared, or reserved bits are set incorrectly, the CPU raises a page‑fault exception. Reserved‑bit rules differ per level (e.g., bits 51‑MAXPHYADDR are reserved, PS bits in certain levels are reserved when the processor does not support large pages, etc.).
5. Practical Experiments and Kernel Integration
Using the Linux cpuid utility, the author shows how to query the maximum physical (39 bits) and virtual (48 bits) address widths of the CPU, confirming support for 4‑level paging.
$ cpuid | grep address
physical address extensions = true
maximum physical address bits = 0x27 (39)
maximum linear (virtual) address bits = 0x30 (48)Kernel macros for address conversion ( __va, __pa, etc.) are displayed, illustrating how virtual and physical addresses are translated inside the kernel.
#define __PAGE_OFFSET_BASE_L4 0xffff888000000000UL
#define __PAGE_OFFSET __PAGE_OFFSET_BASE_L4
#define PAGE_OFFSET ((unsigned long)__PAGE_OFFSET)
#define __va(x) ((void *)((unsigned long)(x) + PAGE_OFFSET))
#define __pa(x) __phys_addr((unsigned long)(x))Sample program prints the physical address of two kernel virtual addresses, both resolving to 0x220a000:
address: 0x220a000
address: 0x220a000
address: 0x220a000Additional kernel code is added to arch/x86/mm/init.c to walk and print the entire page‑table hierarchy before and after a CR3 switch. Sample kernel log excerpts show the CR3 value, PGD, PUD, PMD, and PTE entries, and the resulting physical addresses such as 0x80000000022000e3 and 0x22001e3. The high bit (bit 63) set to 1 marks the page as non‑executable.
6. Address‑Space Layout in Linux
On a system with 48‑bit virtual addresses, user space occupies 0x0000000000000000‑0x00007fffffffffff while kernel space occupies 0xffff800000000000‑0xffffffffffffffff. The unused region between them is left unmapped to satisfy the canonical‑address requirement.
Overall, the article provides a step‑by‑step walkthrough of how x86‑64 paging works, from hardware registers and control bits to kernel‑level table construction, mode selection, address translation, and practical debugging techniques.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
