Single Linux Kernel Commit Delivers 40× Memory Allocation Speedup
A recent Intel‑driven Linux kernel test revealed that a single commit fixing THP‑PMD alignment boosted 1‑byte malloc throughput by 3889% (≈40×), highlighting how tiny memory‑management tweaks can dramatically improve performance despite being demonstrated on synthetic benchmarks.
In performance engineering, improvements usually come from either better hardware or more efficient software. A recent Intel core‑test robot submitted a one‑line change to the Linux kernel that increased memory‑allocation throughput by 3889% (almost 40×), a result the community welcomed enthusiastically.
Performance boost revealed by the will‑it‑scale test
The breakthrough appeared in the will‑it‑scale benchmark run on a system equipped with an Intel Xeon Platinum 8380H processor (Cooper Lake architecture) featuring four sockets and 224 threads (28 cores, 56 threads per CPU). The test measured 1‑byte malloc throughput, which jumped close to 40 times after the change.
Commit details
Commit hash : d4148aeab412432bf928f311eca8a2ba52bb05df
Change summary : Optimizes memory management (mm) and memory mapping (mmap), especially the alignment between Transparent Huge Pages (THP) and Page Middle Directory (PMD).
Deep memory‑management optimization
The modification targets the Linux kernel’s memory subsystem, focusing on THP and PMD alignment. It resolves a long‑standing issue where misaligned PMD entries caused TLB and cache‑alias problems, leading to performance regressions.
Earlier, commit efa7df3e3bb5 attempted to align large anonymous mappings to THP boundaries. While this could improve performance for sufficiently large mappings, it also introduced regressions in benchmarks such as cactusBSSN , where performance sometimes dropped by up to 600%.
Fixing the regression
Analysis showed that certain workloads generated many small 4632 kB regions. Previously these were merged into a larger region, but the new commit split them and aligned each to a PMD boundary, triggering TLB/cache alias issues and slowing the benchmark.
The fix now requires the mapping size to be an exact multiple of the PMD size, not merely “greater than or equal to” a PMD. This prevents the creation of irregular mappings that caused the slowdown, restoring the original performance.
Real‑world significance of the boost
It is important to note that the dramatic speedup was observed in a synthetic test case; real‑world workloads may not achieve the same magnitude of improvement. Nevertheless, the optimization demonstrates the substantial potential for future Linux kernel enhancements, especially in scenarios demanding highly efficient memory management.
Conclusion: Small change, huge impact
This Linux kernel optimization shows that even a single line of code can profoundly affect system performance. While the 40× gain was measured on a synthetic benchmark, the lessons learned provide valuable guidance for future kernel work, promising better efficiency and lower resource consumption for high‑load, high‑performance applications.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
