How Huge Linux Pages Can Boost Database Throughput on Kubernetes by Up to 8×
This article explains how Linux page size—from the default 4 KB to 2 MB or 1 GB huge pages—affects database performance, details the role of TLB cache hits and misses, presents benchmark results showing up to an eight‑fold throughput increase, and offers practical guidance for configuring huge pages on Kubernetes nodes.
Linux page sizes and TLB impact
On x86‑64 Linux three page sizes are available: 4 KB (default), 2 MB and 1 GB. Small pages minimise internal fragmentation for tiny allocations, while large pages reduce the total number of page‑table entries and therefore the number of Translation Lookaside Buffer (TLB) entries required to map a memory region. Each memory access must translate a virtual address to a physical address; the CPU caches recent translations in the TLB. A TLB hit is a single‑cycle hardware operation, whereas a miss triggers a page‑table walk in the kernel (efficient C code but still orders of magnitude slower). Databases perform millions of memory accesses, so TLB miss rates directly affect read/write latency, especially for wide rows that span many 4 KB pages.
Benchmark methodology
Benchmarks were executed on an AMD EPYC 7J1C3 @ 2.55 GHz processor. One hundred million rows were pre‑loaded into DRAM and accessed via IPC (no TCP). Three row sizes were tested:
128 B (fits in a single 4 KB page)
8 KB (spans two 4 KB pages)
16 KB (spans four 4 KB pages)
For each row size the workload was run with three Linux page‑size configurations: 4 KB, 2 MB and 1 GB. The database client used 128 concurrent connections and the server was single‑node, ensuring the entire dataset resided in RAM.
Results
Using 2 MB pages instead of 4 KB increased throughput dramatically:
128 B rows – up to 8× higher
8 KB rows – up to 8× higher
16 KB rows – up to 5× higher
Switching from 2 MB to 1 GB pages yielded a modest additional gain of 1 %–21 % depending on row width (all rows still fit within a single 2 MB page, so the benefit comes from reduced TLB miss probability).
CPU TLB characteristics
Typical entry counts for modern CPUs:
Intel Ice Lake
4 KB L1 TLB – 64 entries
2 MB L1 TLB – 32 entries
1 GB L1 TLB – 8 entries
L2 TLB (4 KB + 2 MB) – 1 024 entries
L2 TLB (4 KB + 1 GB) – 1 024 entries
AMD EPYC Zen 3
L1 TLB (4 KB + 2 MB + 1 GB) – 64 entries total
L2 TLB (4 KB + 2 MB) – 512 entries
Because the L1 TLB holds only a few dozen 4 KB entries, workloads with wide rows or high concurrency quickly exhaust the cache, causing frequent misses. Switching to 2 MB pages effectively expands the address range covered by each TLB entry, dramatically reducing miss rates.
Optimizing Kubernetes nodes for databases
Kubernetes itself does not manage huge pages; they must be configured on the host OS before pods start.
Disable Transparent Huge Pages (THP) to avoid unpredictable memory usage:
echo never > /sys/kernel/mm/transparent_hugepage/enabledAllocate the desired number of huge pages. For 2 MB pages on a node with 256 GB RAM, for example:
# echo 131072 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepagesAdjust the count to leave enough memory for other workloads.
If 1 GB pages are required, add a kernel boot parameter (e.g., default_hugepagesz=1G hugepagesz=1G hugepages=64) and reboot.
Expose the huge‑page resources to pods via the resources.limits and resources.requests fields, e.g.:
resources:
limits:
hugepages-2Mi: "4Gi"
requests:
hugepages-2Mi: "4Gi"Label and taint nodes that are provisioned with huge pages (e.g., node-role.kubernetes.io/db=true) and use pod node selectors or affinity rules so that the database pod lands on a suitable node.
What can and cannot be controlled
Cannot control : row/record width, total row count, database work‑set size, query concurrency, CPU TLB size.
Can control : Linux kernel page size on each node, number of huge pages allocated, pod memory requests/limits, node labeling/tainting to ensure placement on a node with the appropriate huge‑page configuration.
Practical recommendations
For most OLTP workloads, configure 2 MB huge pages on dedicated database nodes; this yields up to an 8× throughput increase for narrow and medium rows and a 5× increase for wider rows.
Consider 1 GB pages only if the workload consistently accesses rows larger than 2 MB or if the node has abundant RAM; the gain over 2 MB pages is modest (1 %–21 %).
Always disable THP to prevent memory waste and unpredictable latency.
Allocate enough huge pages to cover the database’s active working set while leaving headroom for other system components.
Use node labels/taints and pod affinity to schedule database pods onto nodes that have been prepared with the required huge‑page configuration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
