Fundamentals 21 min read

Why Cache Memory Matters: From Code Layout to Multi‑Level Caches

This article explains why cache memory is essential for modern CPUs, how different loop orders affect cache hits, the structure of direct‑mapped, set‑associative and fully‑associative caches, multi‑level cache hierarchies, and the policies that govern cache allocation and updates.

dbaplus Community

Mar 30, 2020

Why Cache Memory Matters: From Code Layout to Multi‑Level Caches

Why Cache Memory Is Needed

CPU registers operate in sub‑nanosecond time while main memory (DDR) accesses take around 65 ns, creating a performance gap of two orders of magnitude. To bridge this gap, a small, fast storage layer called cache memory is placed between the CPU and main memory.

Impact of Loop Order on Cache Behaviour

Consider two equivalent C snippets that fill a 10 × 128 array with ones:

int arr[10][128];</code>
<code>for (i = 0; i < 10; i++)</code>
<code>    for (j = 0; j < 128; j++)</code>
<code>        arr[i][j] = 1;

and the transposed version:

int arr[10][128];</code>
<code>for (i = 0; i < 128; i++)</code>
<code>    for (j = 0; j < 10; j++)</code>
<code>        arr[j][i] = 1;

Both produce the same result, but the second order aligns memory accesses with cache line boundaries, leading to fewer cache misses and better performance.

Memory Hierarchy and Cache Placement

When a CPU reads or writes a variable, it first checks the cache. If the data is present (a hit), the operation completes quickly; otherwise (a miss) the data is fetched from main memory into the cache.

Multi‑Level Cache

Modern CPUs use several cache levels: L1 (fastest, smallest), L2, and L3 (larger, slower). Speed decreases while capacity increases across levels, but all remain much faster than main memory.

In Cortex‑A53, L1 is split into separate instruction (I‑Cache) and data (D‑Cache) caches, each core has its own L1. All cores in a cluster share an L2, and all clusters share an L3.

Direct‑Mapped Cache

A direct‑mapped cache maps each memory address to exactly one cache line using three fields:

offset : selects a byte within a cache line.

index : selects one of the cache’s lines.

tag : stored alongside the line to verify a hit.

Example: 64 B cache, 8 B line → 8 lines. Offset = 3 bits, index = 3 bits, tag = remaining bits (e.g., 42 bits for a 48‑bit address).

Cache Thrashing

With a direct‑mapped cache, addresses that share the same index overwrite each other. Accessing 0x00, 0x40, 0x80 repeatedly causes continuous misses—a phenomenon called cache thrashing.

Set‑Associative Cache (Two‑Way Example)

Dividing the cache into multiple “ways” reduces thrashing. A two‑way set‑associative cache with the same 64 B size has two sets of 4 lines each. The index now selects a set (2 bits), and the tag is compared against both lines in the set.

Full‑Associative Cache

All lines belong to a single set; the hardware compares the tag with every line. This eliminates thrashing but increases hardware cost.

Four‑Way Set‑Associative Example

For a 32 KB cache with 32 B lines and 4 ways:

Lines per way = 8 KB → 256 sets.

Offset = 5 bits (32 B line).

Index = 8 bits (256 sets).

Tag = 35 bits (48‑bit address – offset – index).

Cache Allocation Policies

Read allocation : On a read miss, a cache line is allocated and the data is loaded from main memory (default behavior).

Write allocation : On a write miss, the line is first allocated (read‑allocate) before the write is performed; otherwise the write updates only main memory.

Cache Update Policies

Write‑through : On a write hit, both the cache line and main memory are updated immediately, keeping them always consistent.

Write‑back : On a write hit, only the cache line is updated and a “dirty” bit is set. The modified line is written back to main memory only when it is evicted.

Write‑Back Miss Example

Assume a 64 B direct‑mapped cache with 8 B lines, using write‑allocate and write‑back. A read from address 0x2A misses; the line’s dirty bit is set, so the old data (e.g., 0x11223344) is written back to its original main‑memory address before the new line is loaded.

After the write‑back, the new line is filled, the dirty bit cleared, and the requested byte (offset 0x52) is returned to the CPU.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cache Memory Hierarchy CPU performance write back Set Associative cache mapping

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.