Fundamentals 67 min read

How Linux Maps Virtual Memory to Physical Memory: Inside Page Tables

This article walks through the Linux kernel's memory‑management subsystem, explaining how virtual memory is linked to physical memory via page tables, covering single‑level, two‑level and multi‑level paging, the structure of page‑table entries, the role of the MMU and TLB, and the complete CPU address‑translation process.

Bin's Tech Cabin
Bin's Tech Cabin
Bin's Tech Cabin
How Linux Maps Virtual Memory to Physical Memory: Inside Page Tables

The author previously wrote a series on Linux memory management, covering virtual and physical memory separately; this article connects the two, presenting a unified view of how the kernel maps virtual memory to physical memory.

Virtual memory is an illusion created by the CPU and OS that makes each process think it owns the entire address space (e.g., 3 GB on 32‑bit systems, 128 TB on 64‑bit systems), providing isolation and security.

Each byte in a process's virtual address space has a corresponding virtual address, and each byte in physical memory has a corresponding physical address.

The kernel manages the mapping between virtual and physical memory using page tables. A physical page is 4 KB and is represented by struct page. Pages are numbered with a PFN (Page Frame Number) that uniquely identifies each page. All pages are stored in a global array mem_map.

1. How Virtual Memory Maps to Physical Memory

When a process requests memory (e.g., via malloc or mmap), the kernel allocates a virtual page. The page can be in one of three states:

Unallocated page : The virtual page exists but has not been requested by the process yet.

Allocated but not mapped page : The process has a virtual page, but it is not yet linked to a physical page.

Normal page : The page is allocated and mapped; the MMU translates the virtual address to a physical address on a page‑fault.

When a normal page is accessed, the MMU extracts the page‑table‑offset from the virtual address, looks up the corresponding PTE, and obtains the physical page's base address, then adds the offset within the page to locate the exact byte.

The MMU is responsible for translating virtual addresses to physical addresses; this will be detailed later.

2. How the Kernel Uses Page Tables to Manage Mappings

Memory management in the kernel is page‑based. Each physical page is described by struct page. The kernel allocates a top‑level page‑table (pgd) for each process and stores its address in struct mm_struct as pgd. During a context switch, load_new_mm_cr3 loads the process's pgd address (converted to a physical address) into the CR3 register.

typedef struct pglist_data {
    // NUMA node id
    int node_id;
    // Pointer to the array of pages managed by this node
    struct page *node_mem_map;
} pglist_data;

When a process is created via fork, the kernel copies the parent’s mm_struct and its page tables. Functions such as _do_fork, copy_process, dup_mm, and mm_init allocate a new pgd for the child and copy the parent’s mappings.

static int mm_alloc_pgd(struct mm_struct *mm) {
    mm->pgd = pgd_alloc(mm);
    if (unlikely(!mm->pgd))
        return -ENOMEM;
    return 0;
}

Kernel threads do not have their own address space; their active_mm points to the previous user process’s mm_struct, allowing them to reuse the kernel portion of the page tables without extra allocation.

3. Drawbacks of Single‑Level Page Tables

In a 32‑bit system, a single‑level page table consists of 1024 page tables, each 4 KB, mapping a total of 4 GB of physical memory. This requires 4 MB of contiguous physical memory just for the tables, which is wasteful and often impossible due to fragmentation. Moreover, each process would need its own 4 MB of contiguous memory, leading to massive overhead.

4. Evolution to Multi‑Level Page Tables

Multi‑level paging reduces memory consumption by allocating page tables lazily. A two‑level scheme uses a page‑directory (1024 entries, 4 KB) that points to page tables (also 4 KB). Only the page tables needed for currently accessed memory are allocated, dramatically cutting the required memory.

4.1 Two‑Level Page Tables

The virtual address is split into three fields: page‑directory index (10 bits), page‑table index (10 bits), and offset within the page (12 bits). The CR3 register holds the physical address of the page‑directory. The MMU first uses the directory index to locate a PDE, then the table index to locate a PTE, and finally adds the offset to obtain the physical byte.

typedef unsigned long pteval_t;
typedef struct { pteval_t pte; } pte_t;

PTE layout (32‑bit): bits 0‑11 are flags (present, read/write, user, etc.), bits 12‑31 store the high part of the physical page address.

#define _PAGE_BIT_PRESENT   0
#define _PAGE_BIT_RW        1
#define _PAGE_BIT_USER      2
#define _PAGE_BIT_PWT       3
#define _PAGE_BIT_PCD       4
#define _PAGE_BIT_ACCESSED  5
#define _PAGE_BIT_DIRTY     6
#define _PAGE_BIT_PAT        7
#define _PAGE_BIT_GLOBAL    8

Page‑directory entries (PDE) have a similar layout; the PS (Page Size) bit (bit 7) indicates whether the entry points to a large 4 MB page instead of a lower‑level table.

4.2 Four‑Level Page Tables (64‑bit)

64‑bit Linux uses a four‑level hierarchy: PGD → PUD → PMD → PT. Each level contains 512 entries (9 bits). The virtual address is divided as follows: PGD index (9 bits), PUD index (9 bits), PMD index (9 bits), PT index (9 bits), and offset (12 bits). The CR3 register holds the physical address of the PGD.

typedef unsigned long pteval_t;
typedef struct { pteval_t pte; } pte_t;

typedef unsigned long pmdval_t;
typedef struct { pmdval_t pmd; } pmd_t;

typedef unsigned long pudval_t;
typedef struct { pudval_t pud; } pud_t;

typedef unsigned long pgdval_t;
typedef struct { pgdval_t pgd; } pgd_t;

Each PT entry maps a 4 KB page; each PMD entry maps a 2 MB page; each PUD entry maps a 1 GB page; each PGD entry maps 512 GB. Large pages are indicated by the PS bit (bit 7) and are treated as special PTEs.

Helper macros extract indices from a virtual address:

#define pgd_index(address)   ((address) >> PGDIR_SHIFT)
#define pud_index(address)   (((address) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
#define pmd_index(address)   (((address) >> PMD_SHIFT) & (PTRS_PER_PMD-1))
#define pte_index(address)   (((address) >> PAGE_SHIFT) & (PTRS_PER_PTE-1))

Lookup functions walk the hierarchy:

static inline pud_t *pud_offset(pgd_t *pgd, unsigned long address) {
    return (pud_t *)pgd_page_vaddr(*pgd) + pud_index(address);
}
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) {
    return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}
static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) {
    return (pte_t *)pmd_page_vaddr(*pmd) + pte_index(address);
}

5. The Complete CPU Address‑Translation Process

The MMU performs the page‑table walk described above. To speed up translation, the MMU first checks the TLB (Translation Lookaside Buffer), a small hardware cache of recent PTEs. On a TLB hit, translation finishes immediately; on a miss, the MMU walks the page tables, caches the resulting PTE in the TLB, and proceeds.

After obtaining the physical address, the CPU looks for the data in its caches (L1/L2/L3). If the data is not cached, the address is sent over the system bus to the memory controller, which reads the appropriate DRAM cell and returns the data to the CPU.

Conclusion

This article linked the previously separate discussions of virtual and physical memory management, explained how the kernel uses page tables to create a mapping between the two, detailed the structures of page‑table entries and directory entries for both 32‑bit and 64‑bit systems, described the evolution from single‑level to multi‑level paging, and finally walked through the complete CPU address‑translation flow involving the MMU and TLB.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

memory-managementLinuxVirtual MemoryPage TablesOperating System Fundamentals
Bin's Tech Cabin
Written by

Bin's Tech Cabin

Original articles dissecting source code and sharing personal tech insights. A modest space for serious discussion, free from noise and bureaucracy.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.