Why Cache Memory Matters: From Code Layout to Multi‑Level Caches
This article explains why cache memory is essential for modern CPUs, how different loop orders affect cache hits, the structure of direct‑mapped, set‑associative and fully‑associative caches, multi‑level cache hierarchies, and the policies that govern cache allocation and updates.
Why Cache Memory Is Needed
CPU registers operate in sub‑nanosecond time while main memory (DDR) accesses take around 65 ns, creating a performance gap of two orders of magnitude. To bridge this gap, a small, fast storage layer called cache memory is placed between the CPU and main memory.
Impact of Loop Order on Cache Behaviour
Consider two equivalent C snippets that fill a 10 × 128 array with ones:
int arr[10][128];</code>
<code>for (i = 0; i < 10; i++)</code>
<code> for (j = 0; j < 128; j++)</code>
<code> arr[i][j] = 1;and the transposed version:
int arr[10][128];</code>
<code>for (i = 0; i < 128; i++)</code>
<code> for (j = 0; j < 10; j++)</code>
<code> arr[j][i] = 1;Both produce the same result, but the second order aligns memory accesses with cache line boundaries, leading to fewer cache misses and better performance.
Memory Hierarchy and Cache Placement
When a CPU reads or writes a variable, it first checks the cache. If the data is present (a hit), the operation completes quickly; otherwise (a miss) the data is fetched from main memory into the cache.
Multi‑Level Cache
Modern CPUs use several cache levels: L1 (fastest, smallest), L2, and L3 (larger, slower). Speed decreases while capacity increases across levels, but all remain much faster than main memory.
In Cortex‑A53, L1 is split into separate instruction (I‑Cache) and data (D‑Cache) caches, each core has its own L1. All cores in a cluster share an L2, and all clusters share an L3.
Direct‑Mapped Cache
A direct‑mapped cache maps each memory address to exactly one cache line using three fields:
offset : selects a byte within a cache line.
index : selects one of the cache’s lines.
tag : stored alongside the line to verify a hit.
Example: 64 B cache, 8 B line → 8 lines. Offset = 3 bits, index = 3 bits, tag = remaining bits (e.g., 42 bits for a 48‑bit address).
Cache Thrashing
With a direct‑mapped cache, addresses that share the same index overwrite each other. Accessing 0x00, 0x40, 0x80 repeatedly causes continuous misses—a phenomenon called cache thrashing.
Set‑Associative Cache (Two‑Way Example)
Dividing the cache into multiple “ways” reduces thrashing. A two‑way set‑associative cache with the same 64 B size has two sets of 4 lines each. The index now selects a set (2 bits), and the tag is compared against both lines in the set.
Full‑Associative Cache
All lines belong to a single set; the hardware compares the tag with every line. This eliminates thrashing but increases hardware cost.
Four‑Way Set‑Associative Example
For a 32 KB cache with 32 B lines and 4 ways:
Lines per way = 8 KB → 256 sets.
Offset = 5 bits (32 B line).
Index = 8 bits (256 sets).
Tag = 35 bits (48‑bit address – offset – index).
Cache Allocation Policies
Read allocation : On a read miss, a cache line is allocated and the data is loaded from main memory (default behavior).
Write allocation : On a write miss, the line is first allocated (read‑allocate) before the write is performed; otherwise the write updates only main memory.
Cache Update Policies
Write‑through : On a write hit, both the cache line and main memory are updated immediately, keeping them always consistent.
Write‑back : On a write hit, only the cache line is updated and a “dirty” bit is set. The modified line is written back to main memory only when it is evicted.
Write‑Back Miss Example
Assume a 64 B direct‑mapped cache with 8 B lines, using write‑allocate and write‑back. A read from address 0x2A misses; the line’s dirty bit is set, so the old data (e.g., 0x11223344) is written back to its original main‑memory address before the new line is loaded.
After the write‑back, the new line is filled, the dirty bit cleared, and the requested byte (offset 0x52) is returned to the CPU.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
