Fundamentals 15 min read

Understanding Linux HugePages: Kernel Page Tables, TLB, and How to Enable and Use Them

This article explains the Linux kernel's four‑level page‑table mechanism, the performance impact of TLB misses, why using 2 MB HugePages improves address‑translation efficiency, and provides step‑by‑step instructions for reserving and allocating HugePages via boot parameters, sysfs, and mmap.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
Understanding Linux HugePages: Kernel Page Tables, TLB, and How to Enable and Use Them

Hello, I'm Fei! If you have ever deployed an Oracle database you have probably seen Oracle recommend enabling HugePages for performance.

This article dives into why HugePages improve performance, the underlying kernel mechanisms, and how to enable them.

1. The Problem with the Four‑Level Page Table

To understand HugePages we first review Linux's four‑level page‑table. The kernel translates a virtual address to a physical address by walking four tables (PGD, PUD, PMD, PTE), each indexed by 9 bits, giving a 4 KB page as the smallest allocation unit.

Each translation requires up to four memory accesses to fetch the page‑table entries, plus the final data access – up to five memory I/Os per access.

Because the Translation Lookaside Buffer (TLB) that caches page‑table entries is small (tens to a few thousand entries), large address spaces (e.g., a process needing 40 GB) cause many TLB misses.

2. How HugePages Help

Replacing 4 KB pages with 2 MB pages reduces the number of pages from ~10 million to ~20 thousand for a 40 GB workload, dramatically improving TLB hit rates and allowing the kernel to collapse the four‑level walk into a three‑level walk.

Key conclusion: using 2 MB HugePages can greatly boost virtual‑to‑physical address translation performance.

3. Reserving HugePages

There are two reservation methods:

Boot‑time reservation: edit /boot/grub/grub.cfg and add hugepagesz=2M hugepages=512 (example values).

Runtime reservation: write the desired number to /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages .

4. Allocating HugePages in User Space

Allocation consists of opening the hugetlbfs pseudo‑file and calling mmap :

int main()
{
    // open hugepage handle
    fd = open("/mnt/huge/hugepage...", O_CREAT|O_RDWR);

    // allocate hugepage
    addr = mmap(0, MAP_LENGTH, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
}

4.1 Kernel Initialization of HugePages

During boot the kernel runs hugetlb_init (registered via subsys_initcall ) which:

Initialises the default hstate structures for each supported hugepage size.

Allocates free hugepages and populates the hugepage_freelists linked list.

Creates the sysfs entries under /sys/kernel/mm/hugepages and per‑node directories.

The allocation loop looks like:

for (i = 0; i < h->max_huge_pages_node[nid]; ++i) {
    page = alloc_fresh_huge_page(h, gfp_mask, nid, &node_states[N_MEMORY], NULL);
    if (page)
        break;
}
free_huge_page(page);
return 1;

4.2 Mapping via hugetlbfs

Opening a file in hugetlbfs triggers hugetlb_file_setup , which creates a struct file with hugetlbfs_file_operations . The mmap operation of this file points to hugetlbfs_file_mmap , which ultimately calls hugetlb_reserve_pages to reserve the required hugepages.

4.3 Page‑Fault Handling for HugePages

When a hugepage fault occurs, handle_mm_fault dispatches to hugetlb_fault , which calls hugetlb_no_page . This function:

Allocates a free hugepage from hugepage_freelists .

Creates a hugepage PTE with make_huge_pte and installs it via set_huge_pte_at .

After this the process can access the allocated hugepage memory.

5. Summary

Linux uses virtual memory that must be translated to physical addresses. The TLB caches page‑table entries to speed up this translation, but its limited size makes many small pages inefficient. Switching to 2 MB HugePages reduces the number of pages, improves TLB hit rates, and cuts translation overhead, which is especially beneficial for memory‑intensive workloads such as Oracle databases.

For small workloads (a few hundred megabytes) the benefit is minimal, so enabling HugePages may not be worthwhile.

PerformanceMemory ManagementKernelLinuxTLBhugepages
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.