Fundamentals 40 min read

Efficient Memory Management: Paging Mechanism in x86‑64 Architecture

This article explains how the x86‑64 paging mechanism maps virtual addresses to physical memory, detailing the required protection‑mode conditions, page‑table structures, control‑register settings, various paging modes (32‑bit, PAE, 4‑level, 5‑level), and provides concrete kernel code examples and address‑translation demonstrations.

Linux Kernel Journey
Linux Kernel Journey
Linux Kernel Journey
Efficient Memory Management: Paging Mechanism in x86‑64 Architecture

1. Role of Memory Paging

In the x86‑64 architecture, paging splits linear (virtual) addresses into fixed‑size pages and maps each page to a physical address. Paging only works when the CPU is in protected mode (CR0.PE = 1). When enabled, applications see a linear address space that is independent of actual physical locations, similar to a virtual‑reality game where the player perceives virtual positions without knowing the real hardware addresses.

Advantages of Paging

Virtualization: each process gets an isolated address space, improving security and isolation.

Memory sharing: multiple processes can map the same physical page, saving RAM.

Lazy loading: pages are loaded only when accessed, reducing initialization time and memory usage.

Memory protection: pages can be marked read‑only or non‑executable to protect code and data.

2. Enabling and Disabling Paging

Paging is controlled by the PG bit (bit 31) of the CR0 register. When CR0.PG = 0, paging is disabled and linear addresses equal physical addresses. When CR0.PG = 1, paging is active and linear addresses must be translated through paging structures.

Prepare page directory and page tables.

Write the high 20 bits of the page‑directory physical address into CR3 (bits 31‑12).

Set CR0.PG to 1.

Boot Code Example (loader.s)

; os/src/boot/loader.s
; 下面就是保护模式下的程序了
[bits 32]

p_mode_start:
    mov ax, SELECTOR_DATA
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov esp,LOADER_STACK_TOP
    mov ax, SELECTOR_VIDEO
    mov gs, ax

    mov byte [gs:320], 'M'
    mov byte [gs:322], 'A'
    mov byte [gs:324], 'I'
    mov byte [gs:326], 'N'

    call setup_page ; 创建页目录及页表并初始化页内存位图

    sgdt [gdt_ptr]          ; 保存原 GDT
    mov ebx, [gdt_ptr + 2]
    or dword [ebx + 0x18 + 4], 0xc0000000   ; 视频段基址 + 0xc0000000
    add dword [gdt_ptr + 2], 0xc0000000    ; GDT 基址映射到高地址
    add esp, 0xc0000000                    ; 栈指针映射到高地址

    mov eax, PAGE_DIR_TABLE_POS
    mov cr3, eax            ; 把页目录地址写入 CR3
    mov eax, cr0
    or eax, 0x80000000      ; 打开 PG 位
    mov cr0, eax
    lgdt [gdt_ptr]          ; 重新加载 GDT

    mov byte [gs:320], 'V'
    mov byte [gs:322], 'i'
    mov byte [gs:324], 'r'
    mov byte [gs:326], 't'
    mov byte [gs:328], 'u'
    mov byte [gs:330], 'a'
    mov byte [gs:332], 'l'
    jmp $

setup_page:
    mov ecx, 4096
    mov esi, 0
.clear_page_dir:
    mov byte [PAGE_DIR_TABLE_POS + esi], 0
    inc esi
    loop .clear_page_dir

    ; 创建页目录(第 0 项和 0xc00 项指向同一个页表)
    mov eax, PAGE_DIR_TABLE_POS
    add eax, 0x1000
    mov ebx, eax
    or eax, PG_US_U | PG_RW_W | PG_P
    mov [PAGE_DIR_TABLE_POS + 0x0], eax
    mov [PAGE_DIR_TABLE_POS + 0xc00], eax
    sub eax, 0x1000
    mov [PAGE_DIR_TABLE_POS + 4092], eax

    ; 创建第一个页表(256 项)
    mov ecx, 256
    mov esi, 0
    mov edx, PG_US_U | PG_RW_W | PG_P
.create_pte:
    mov [ebx+esi*4], edx
    add edx, 4096
    inc esi
    loop .create_pte

    ; 创建内核其余页目录项
    mov eax, PAGE_DIR_TABLE_POS
    add eax, 0x2000
    or eax, PG_US_U | PG_RW_W | PG_P
    mov ebx, PAGE_DIR_TABLE_POS
    mov ecx, 254
    mov esi, 769
.create_kernel_pde:
    mov [ebx+esi*4], eax
    inc esi
    add eax, 0x1000
    loop .create_kernel_pde
    ret

Macro Definitions (boot.inc)

PAGE_DIR_TABLE_POS equ 0x100000
PG_P  equ   1b
PG_RW_R   equ  00b 
PG_RW_W   equ  10b 
PG_US_S   equ  000b 
PG_US_U   equ  100b

3. Paging Modes Supported by Intel‑64

32‑bit paging: CR4.PAE = 0

PAE paging: CR4.PAE = 1, IA32_EFER.LME = 0

4‑level paging: CR4.PAE = 1, IA32_EFER.LME = 1, CR4.LA57 = 0

5‑level paging: CR4.PAE = 1, IA32_EFER.LME = 1, CR4.LA57 = 1

The current mode is determined by the combination of CR4.PAE, CR4.LA57 and IA32_EFER.LME.

3.1 32‑bit Paging

Enabled after setting CR0.PG. It uses a three‑level translation (CR3 → PDE → PTE → physical page). Linear addresses are limited to 4 GiB, and each page table entry is 4 bytes.

Page‑table entry (PTE) format (bits):

P (Present) – 1 if the page is in RAM.

RW – 1 for read/write, 0 for read‑only.

US – 1 if user‑mode can access.

PCD – cache‑disable.

Dirty – set by CPU on write.

… (other metadata in the low 12 bits).

The low 12 bits of a PTE are always zero for the physical address because pages are 4 KB aligned; those bits store the metadata instead.

3.2 PAE Paging

Physical‑Address Extension expands the physical address width to up to 52 bits. Page‑table entries become 8 bytes, and an extra level (PDPTE) is introduced. CR3 now points to a Page‑Directory‑Pointer Table (PDPT) that holds four PDPTEs.

Translation sequence: CR3 → PDPTE → PDE → PTE → physical page. The extra level does not add extra memory accesses during translation because the PDPTEs are cached in dedicated registers.

3.3 4‑Level Paging

Used in long‑mode (64‑bit) CPUs. Adds a top‑level PML4 table. Translation chain: CR3 → PML4E → PDPTE → PDE → PTE → physical page. Supports 48‑bit linear addresses and up to 52‑bit physical addresses. Each table entry is 8 bytes, and each table contains 512 entries.

3.4 5‑Level Paging

Activated when CR4.LA57 = 1. Extends linear addresses to 57 bits, useful for massive memory workloads (e.g., large cloud platforms). The hierarchy becomes: CR3 → PML5E → PML4E → PDPTE → PDE → PTE → physical page.

4. Detailed Page‑Structure Layout

All paging modes use hierarchical tables of 4 KB pages. In 32‑bit mode each entry is 4 bytes (1024 entries per table). In 4‑/5‑level modes each entry is 8 bytes (512 entries per table). PAE uses a special 32‑byte structure containing four 8‑byte entries.

The linear address is split into a high part (page‑number) that selects entries in the hierarchy and a low part (page‑offset) that indexes within the final physical page.

4.1 Translation Process

Starting from the address stored in CR3, the CPU extracts the appropriate bits of the linear address to index the current table. If the entry points to another table, the process repeats; if the entry maps a page, the physical address is formed by concatenating the entry’s address bits with the page‑offset.

4‑level example (48‑bit address): bits 47‑39 → PML4 index, bits 38‑30 → PDPTE index, bits 29‑21 → PDE index, bits 20‑12 → PTE index, bits 11‑0 → offset.

5‑level adds an extra 9‑bit index at bits 56‑48.

4.2 Exceptions

If a required entry has the Present (P) bit cleared, or reserved bits are set incorrectly, the CPU raises a page‑fault exception. Reserved‑bit rules differ per level (e.g., bits 51‑MAXPHYADDR are reserved, PS bits in certain levels are reserved when the processor does not support large pages, etc.).

5. Practical Experiments and Kernel Integration

Using the Linux cpuid utility, the author shows how to query the maximum physical (39 bits) and virtual (48 bits) address widths of the CPU, confirming support for 4‑level paging.

$ cpuid | grep address
      physical address extensions           = true
      maximum physical address bits          = 0x27 (39)
      maximum linear (virtual) address bits = 0x30 (48)

Kernel macros for address conversion ( __va, __pa, etc.) are displayed, illustrating how virtual and physical addresses are translated inside the kernel.

#define __PAGE_OFFSET_BASE_L4   0xffff888000000000UL
#define __PAGE_OFFSET          __PAGE_OFFSET_BASE_L4
#define PAGE_OFFSET            ((unsigned long)__PAGE_OFFSET)
#define __va(x)                ((void *)((unsigned long)(x) + PAGE_OFFSET))
#define __pa(x)                __phys_addr((unsigned long)(x))

Sample program prints the physical address of two kernel virtual addresses, both resolving to 0x220a000:

address: 0x220a000
address: 0x220a000
address: 0x220a000

Additional kernel code is added to arch/x86/mm/init.c to walk and print the entire page‑table hierarchy before and after a CR3 switch. Sample kernel log excerpts show the CR3 value, PGD, PUD, PMD, and PTE entries, and the resulting physical addresses such as 0x80000000022000e3 and 0x22001e3. The high bit (bit 63) set to 1 marks the page as non‑executable.

6. Address‑Space Layout in Linux

On a system with 48‑bit virtual addresses, user space occupies 0x0000000000000000‑0x00007fffffffffff while kernel space occupies 0xffff800000000000‑0xffffffffffffffff. The unused region between them is left unmapped to satisfy the canonical‑address requirement.

Paging mode capabilities
Paging mode capabilities

Overall, the article provides a step‑by‑step walkthrough of how x86‑64 paging works, from hardware registers and control bits to kernel‑level table construction, mode selection, address translation, and practical debugging techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

memory managementLinux kernelpage-tablesx86-64Paging4-level-pagingPAE
Linux Kernel Journey
Written by

Linux Kernel Journey

Linux Kernel Journey

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.