Understanding CPU Cache, Memory Hierarchy, and Virtual Memory
The article explains how modern computers use fast SRAM caches (L1‑L3) inside the CPU with various mapping schemes and the MESI coherence protocol to keep data consistent, while DRAM serves as main memory, and virtual memory with multi‑level page tables and a TLB abstracts physical memory, provides isolation, and enables swapping.
The article explains the memory structures in computers, how the CPU reads and writes memory data, maintains cache consistency, and the role and necessity of virtual memory.
Overview
Modern computers have two main types of memory: SRAM and DRAM. DRAM implements main memory, while SRAM is used for CPU caches (L1, L2, L3).
SRAM is called "static" because data is retained as long as power is supplied; DRAM is called "dynamic" because it stores data in capacitors that need periodic refresh.
Typical access speeds: L1 – 4 CPU cycles, L2 – 11 cycles, L3 – 39 cycles, DRAM – 107 cycles.
CPU Cache
(1) Cache Structure
CPU cache is SRAM placed inside the CPU. When the CPU needs data it first checks the cache; if the data is not present, it is fetched from main memory. Cache is much smaller than main memory, so a mapping between cache lines and memory addresses is required.
Direct‑mapped cache uses the address modulo the number of cache blocks to select a line:
(地址)mod(cache 中的块数)Cache line fields include index, valid bit, tag, and data.
Set‑associative cache groups lines into sets; each set contains multiple lines (e.g., 2‑way set‑associative). Fully‑associative cache allows any line to be placed in any location.
(2) Cache Read/Write Operations
On a read miss, the cache controller fetches the required block from lower‑level memory. Write‑through writes data to both cache and main memory immediately; write‑back writes only to cache and updates main memory when the block is evicted.
(3) Coherence and MESI Protocol
Multi‑core CPUs share cache data, requiring a coherence protocol. The MESI protocol defines four states: Modified (M), Exclusive (E), Shared (S), and Invalid (I). It ensures that only one core can modify a cache line while others see a consistent view.
Virtual Memory
(1) Virtual‑to‑Physical Mapping
Programs use virtual addresses (VA) which the Memory Management Unit (MMU) translates to physical addresses (PA) via page tables. A page table entry (PTE) contains the physical page number, a valid bit, and permission bits.
When a page is not present in RAM, a page fault occurs and the OS loads the page from disk (swap).
(2) Multi‑Level Page Tables
To avoid huge single‑level page tables, modern systems use hierarchical page tables (e.g., 4‑level in Linux 2.6.10). Each level indexes into the next, reducing memory consumption.
(3) TLB Acceleration
The Translation Lookaside Buffer (TLB) caches recent PTEs, allowing the MMU to translate virtual addresses to physical addresses with fewer memory accesses.
(4) Why Virtual Memory?
Virtual memory provides isolation, simplifies allocation, enables memory protection, and allows programs to run on machines with less physical RAM by swapping inactive pages to disk.
Conclusion
CPU caches (L1‑L3) speed up memory access, while cache coherence protocols like MESI keep data consistent across cores. Virtual memory, implemented with multi‑level page tables and accelerated by the TLB, abstracts physical memory, improves security, and enables efficient use of limited RAM.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.