Fundamentals 11 min read

How openEuler 24.03 LTS’s Dynamic Composite Page Boosts Memory Performance

The openEuler 24.03 LTS release introduces a dynamic composite page (large folio) that retains 4 KB base‑page compatibility while enabling 64 KB page performance gains, reducing TLB misses and memory overhead, and delivering double‑digit benchmark improvements for big‑data, Kafka, MySQL, I/O and memory‑allocation workloads.

Linux Code Review Hub

Jul 25, 2024

How openEuler 24.03 LTS’s Dynamic Composite Page Boosts Memory Performance

In earlier openEuler 20.03 LTS releases, the ARM64 default base‑page size was 64 KB, which offered faster page‑table walks and lower TLB miss rates compared with the traditional 4 KB pages, improving database performance. However, 64 KB pages caused ecosystem compatibility issues and higher memory fragmentation, prompting a revert to 4 KB defaults in openEuler 22.03. The new dynamic composite page (large folio) in openEuler 24.03 LTS solves this by allowing a single kernel binary to support both 4 KB compatibility and 64 KB performance gains without requiring application changes.

The architecture replaces the traditional struct page management with struct folio, which can represent one or multiple pages. Large folio (multiple pages) improves LRU management efficiency, reduces page‑fault frequency, and enables large‑page mappings that cut page‑table walk overhead and TLB misses.

Built on Linux 6.6 LTS, openEuler 24.03 adds extensive large‑folio support to the memory‑management subsystem and file systems. Huawei’s kernel team contributed patches to enable large‑folio handling in memory allocation, reclamation, page cache, and filesystems such as XFS, while many other subsystems still lack large‑folio support.

Key enhancements include:

Memory allocation now aligns virtual memory areas to multiple large‑folio sizes (64 KB, 2 MB) for both anonymous and code segments, supports transparent code‑segment large pages, and introduces per‑CPU caches for 64 KB large folios on ARM64.

Core kernel mechanisms (fork, munmap, mlock, madvise) are updated to handle large folios, and features like memory migration, NUMA balancing, and swap now operate on large‑folio granularity, improving performance.

Soft‑hardware co‑optimizations enable batch TLB flushes, arbitrary‑order large‑page splitting, and leverage ARM64’s contiguous‑bit feature to create 64 KB “compressed” TLB entries, further lowering miss rates.

File‑system improvements focus on the iomap framework and ext4:

iomap now batches block mappings at the large‑folio level, reducing iteration overhead during dirty‑folio writeback.

ext4’s buffered‑write path reserves blocks in bulk for large I/O, and the switch to iomap eliminates the old buffer‑head layer, allowing extent allocation and I/O grouping at large‑folio granularity.

Ext4’s large‑folio support removes the need for jbd2 ordered logging, boosting read/write throughput.

Dynamic composite pages expose multi‑level control interfaces (system‑wide, container‑level, process‑level) so that only critical applications enable large folios, avoiding unnecessary memory consumption for others.

Benchmark results demonstrate the impact:

hibench (Spark) workload shows ~10% average improvement.

Kafka benchmark: producer bandwidth +26%, consumer bandwidth +11%.

MySQL sysbench: iTLB miss rate reduced up to 10×, overall performance +3%.

FIO I/O: read throughput +59%, write throughput +239%.

will‑it‑scale memory‑allocation test: page‑fault operations for anonymous and shared pages improve ~100% thanks to batch allocation of large folios and per‑CPU caches.

Future work will continue to target data‑center scenarios, further integrate with chip‑level TLB optimizations, and contribute the enhancements upstream to the Linux community.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance memory management Benchmark Linux kernel ARM64 openEuler dynamic composite page large folio

Written by

Linux Code Review Hub

A professional Linux technology community and learning platform covering the kernel, memory management, process management, file system and I/O, performance tuning, device drivers, virtualization, and cloud computing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.