Unlocking ARM64 Memory: How Virtual Addresses Map to Physical Memory
This article explains the fundamentals of Linux arm64 memory management, covering virtual and physical memory concepts, MMU operation, page table structures, address translation steps, page fault handling, and practical C++ examples for allocation, mapping, and performance optimization using huge pages and pre‑paging techniques.
1. Linux Memory Management Basics
Before diving into Linux arm64 memory operation, it is essential to master basic concepts such as physical memory, virtual memory, and the role of the Memory Management Unit (MMU).
1.1 Virtual Memory vs Physical Memory
Physical memory is the actual RAM installed in the hardware, e.g., 8 GB, 16 GB, etc., directly accessed by the CPU. Virtual memory is a logical address space created by the OS using disk space, allowing each process to have an isolated address space and enabling the system to handle situations where physical memory is insufficient.
1.2 Memory Management Unit (MMU)
The MMU translates virtual addresses to physical addresses by consulting page tables. It also provides protection by enforcing access permissions (read‑only, read‑write, executable) and can raise exceptions on illegal accesses.
The Translation Lookaside Buffer (TLB) caches recent page‑table entries to speed up translation; a hit avoids a costly memory lookup.
2. arm64 System Architecture
2.1 64‑bit Addressing Capability
arm64 uses a 64‑bit address bus, theoretically supporting up to 2⁶⁴ bytes of memory. In practice the usable virtual address space is limited (e.g., 48 bits), but it still far exceeds the 4 GB limit of 32‑bit architectures, enabling large‑scale data processing and high‑performance applications.
2.2 Register Set
arm64 provides 31 general‑purpose 64‑bit registers, allowing more data to stay in registers and reducing memory traffic compared with 32‑bit CPUs.
2.3 Instruction‑Set Optimizations (NEON SIMD)
The NEON SIMD extension processes multiple data elements per instruction, accelerating multimedia, audio, and signal‑processing workloads.
3. Page Tables: The Bridge Between Virtual and Physical Addresses
3.1 arm64 Page‑Table Structure
arm64 supports both four‑level and three‑level page tables. The four‑level hierarchy (PGD → PUD → PMD → PTE) covers a 48‑bit virtual address space, with each level indexed by 9 bits and a 12‑bit page offset.
PGD (Page Global Directory) is the top‑level entry point. Each PGD entry points to a PUD, which points to a PMD, which finally points to a PTE that contains the physical page‑frame number and permission bits.
3.2 Page‑Table Operations
When a process is created, the kernel allocates and initializes each level of the page table. On a page‑fault, the kernel allocates a physical page, updates the corresponding PTE, and resumes the process. During context switches, the kernel saves the current PGD base address and loads the new process’s PGD.
4. Underlying Logic of Virtual Addresses
4.1 Virtual‑Address Layout
In Linux arm64 the virtual address space is split into kernel space (0xFFFF 0000 0000 0000‑0xFFFF FFFF FFFF FFFF) and user space (0x0000 0000 0000 0000‑0x0000 FFFF FFFF FFFF). User space contains code, data, heap, and stack segments.
4.2 Translation Process (Four‑Level Walk)
Read the PGD base address from a CPU register.
Index the PGD with the highest 9 bits of the virtual address to obtain the PUD physical address.
Index the PUD with the next 9 bits to obtain the PMD physical address.
Index the PMD with the following 9 bits to obtain the PTE page address.
Index the PTE with the lowest 9 bits to obtain the physical page‑frame number.
Combine the page‑frame number with the 12‑bit offset to form the final physical address.
If any level is missing or lacks permission, the MMU raises an exception and the kernel handles the fault.
5. Practical Applications and Case Studies
5.1 Memory Allocation and Release
On arm64, applications typically use malloc (which may invoke brk / sbrk or mmap) and free for heap management, or mmap / munmap for file‑backed mappings. The article provides a complete C++ example that demonstrates allocation, usage, and cleanup.
5.2 Performance‑Optimization Cases
Case 1: Huge Pages – Using 2 MB pages reduces the number of page‑table entries and TLB misses, dramatically improving database query latency. The example shows how to configure huge pages via vm.nr_hugepages and measure TLB hit rates.
Case 2: Page‑Table Optimisation for Real‑Time Systems – Pre‑paging loads critical pages (e.g., sensor data) before they are needed, avoiding page‑fault latency in industrial control loops. Permission‑bit tuning (read‑only for sensor buffers) further reduces overhead.
6. Reference Implementations (C++)
The article includes two self‑contained C++ programs:
A four‑level page‑table simulator that prints each translation step.
A performance benchmark comparing normal 4 KB pages with 2 MB huge pages, showing TLB hit‑rate improvements.
A page‑table manager with pre‑paging and permission optimisation for an embedded control scenario.
Both programs can be compiled with g++ -std=c++11 … -o … and executed on a Linux host.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
