Fundamentals 12 min read

When Huge Pages Hurt Performance: Risks and Best Practices on NUMA Systems

This article explains the origins and mechanics of Huge Pages, why they are not a universal solution, how they can degrade performance on NUMA architectures, and provides practical testing methods and mitigation strategies for developers and system administrators.

dbaplus Community

Jul 6, 2016

When Huge Pages Hurt Performance: Risks and Best Practices on NUMA Systems

Background

Understanding the interaction between CPU caches, the Translation Lookaside Buffer (TLB), and memory management is essential before using Huge Pages. The TLB caches a limited number of virtual‑to‑physical address translations. On a typical Linux system with 4 KB pages and a 64‑entry TLB, only about 256 KB of hot data can be covered, which becomes a bottleneck for memory‑intensive workloads.

What Is a Huge Page?

Huge Pages increase the OS page size (e.g., from 4 KB to 16 MB). With the same 64‑entry TLB, a 16 MB page allows the TLB to map up to 1 GB of hot data, dramatically reducing the number of TLB misses and the associated page‑walk latency.

When Can Huge Pages Provide Benefit?

Huge Pages are advantageous only when:

The application’s hot data set spans more than 64 small (4 KB) pages.

A significant fraction of total execution time is spent handling TLB‑miss‑induced page walks. This fraction can be measured with profiling tools such as oprofile or perf.

If the TLB‑miss cost is low, enlarging pages yields little or no performance gain.

Impact of Huge Pages on NUMA Systems

On Non‑Uniform Memory Access (NUMA) architectures, Huge Pages can introduce two major overheads:

Increased CPU contention for the same page – Write‑intensive workloads experience more cache‑write conflicts, leading to bus saturation, reduced CPU efficiency, and frequent cache invalidations.

False sharing across NUMA nodes – Because a Huge Page is large, data that would be contiguous in 4 KB pages may be split across two NUMA zones. Threads on different CPUs then have to fetch remote data over the inter‑connect, increasing latency and interconnect congestion.

Experimental results from “Large Pages May Be Harmful on NUMA Systems” show performance drops of up to 10 % for certain workloads.

Example: two 1.5 MB arrays allocated consecutively may be placed such that the second array is split between two 2 MB Huge Pages that belong to different NUMA zones. Accesses to the split part travel over the inter‑connect, negating the locality benefits of a large page.

Mitigation Strategies

Research‑stage ideas

Replicate read‑heavy pages in each NUMA zone (page replication) to keep accesses local.

Detect pages with high cache‑miss rates, then split or rearrange them so that data with the same CPU affinity resides on the same page.

Practical approach

Because precise per‑page access metrics require hardware Performance Monitoring Unit (PMU) support, the most reliable method is thorough testing.

Testing Methods

Actual testing : Run the workload with Huge Pages enabled and disabled on a production‑like system, then compare latency, throughput, or other relevant metrics.

Theoretical estimation : Measure the proportion of execution time spent in TLB‑miss page walks (using oprofile or perf). Estimate the potential speed‑up as:

potential_gain = (tlb_miss_time / total_time) * (new_coverage / old_coverage)

where new_coverage is the memory range covered by a Huge Page (e.g., 16 MB) and old_coverage is the range covered by a normal page (4 KB). This estimate ignores the NUMA‑specific overheads; a low value suggests disabling Huge Pages.

If hardware PMU is unavailable, the calibrator tool can be used together with oprofile to approximate the page‑walk cost.

Conclusion

Huge Pages are not a universal optimization. On NUMA systems they can cause measurable slow‑downs due to increased cache contention and false sharing. Careful profiling of TLB‑miss costs and systematic testing—both empirical and analytical—are required before enabling Huge Pages in production.

References

Huge pages part 5: A deeper look at TLBs and costs

About Huge Page

TLB on Wikipedia

Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems

Large Pages May Be Harmful on NUMA Systems

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Memory Management numa TLB Huge Pages

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.