Fundamentals 24 min read

Master the Three Classic Cache Mapping Strategies: Theory and Practical Implementation

This article explains why cache is critical for computer performance and provides a thorough analysis of the three classic cache mapping strategies—direct‑mapped, fully associative, and set‑associative—detailing their mechanisms, advantages, disadvantages, concrete examples, and guidance on selecting the appropriate method for different system scenarios.

Linux Kernel Journey
Linux Kernel Journey
Linux Kernel Journey
Master the Three Classic Cache Mapping Strategies: Theory and Practical Implementation

Cache acts as a high‑speed pathway that accelerates data access, dramatically improving CPU efficiency. The article explores three classic cache mapping strategies and how they work behind the scenes.

1. Direct Mapping: Simple Locator

Direct mapping assigns each main‑memory block a fixed cache line using the modulo operation (block number % number of cache lines). For example, with 8 cache lines, blocks 0, 8, 16 map to line 0; blocks 1, 9, 17 map to line 1, and so on. Each block has a unique, fixed location.

1.1 Detailed Workflow

When the CPU accesses data, the address is split into three fields: index (used to select the cache line), tag (identifies the memory block), and block offset (selects the byte within the line). The CPU checks the tag stored in the selected line; if it matches and the line is valid, a cache hit occurs and the data is read directly. Otherwise, a miss triggers a fetch from main memory and updates the line’s tag and valid bit.

1.2 Main‑Memory and Cache Address Formats

Main‑memory address fields: block number (S bits) and block offset (W bits). Cache address fields: line number (m bits) and line offset (w bits). The block table stores 2^m × S bits; each entry holds the tag for a cache line.

1.3 In‑Depth Advantages and Disadvantages

Advantages: hardware is extremely simple, low cost, and address translation is fast because only one line is examined. Disadvantages: high conflict rate because each block can occupy only one line, leading to poor utilization and reduced hit rate. For instance, repeatedly accessing blocks 0 and 8 forces them to replace each other in line 0, lowering performance.

1.4 Case Study

Cache size 16 KB → 14‑bit address bus.

Block (line) size 512 B → 9‑bit line offset.

Thus line tag = 14 − 9 = 5 bits, giving 32 cache lines.

Main memory 1 MB → 20‑bit address.

Block size 512 B → 9‑bit block offset, leaving 11‑bit block tag; total 2048 blocks.

Block table capacity = 32 × 11 bits.

Address CDE8FH can map to any cache line; its block‑inside‑line offset is 010001111.

2. Fully Associative Mapping: Flexible Data Box

Fully associative mapping allows any memory block to be placed in any cache line, eliminating fixed correspondence and greatly reducing conflict.

2.1 Search Process

The CPU splits the address into tag and offset, then compares the tag against the tag stored in every cache line simultaneously. A matching tag with a valid line yields a hit; otherwise, a miss causes the block to be loaded into a chosen line and the tag updated.

2.2 Main‑Memory Address Format

If main memory has 2^n units divided into 2^s blocks of 2^w units each, the memory address consists of s + w bits.

Cache is divided into 2^m sets, each of size 2^w units, so the cache address consists of m + w bits.

2.3 Comprehensive Evaluation

Pros: highest possible hit rate because any block can occupy any line, ideal for workloads with random or highly scattered accesses. Cons: hardware is complex and costly due to the need for many comparators, and lookup time is slower because every line must be examined.

3. Set‑Associative Mapping: The Harmonizer

Set‑associative mapping combines the simplicity of direct mapping with the flexibility of full associativity. The cache is divided into multiple sets; each set contains several lines (ways). A memory block first maps to a specific set via modulo, then can occupy any line within that set.

3.1 Actual Working Mechanism

The CPU extracts the set index from the address, selects the corresponding set, and then compares the tag with the tags of all lines in that set. A matching tag with a valid line results in a hit; otherwise, the block is fetched from memory and placed in an empty or evicted line within the set.

3.2 Performance Assessment

Set‑associative caches reduce conflict rates compared with direct mapping while keeping hardware complexity lower than fully associative caches. They achieve a good balance of hit rate, access speed, and cost, making them the dominant choice in modern CPUs.

4. Application Scenarios and Selection Strategy

4.1 Suitability Analysis for Different Scenarios

Direct mapping is suitable for low‑cost, small‑scale or embedded systems where access patterns are regular. Fully associative mapping fits high‑performance computing systems that demand the highest hit rates and can tolerate higher cost and complexity. Set‑associative mapping is widely used in desktops, servers, and mobile devices because it balances performance and cost.

4.2 Key Considerations

Cost: direct mapping is cheapest, fully associative is most expensive, set‑associative is intermediate. Performance: fully associative offers the highest hit rate, direct mapping provides the fastest lookup, set‑associative balances both. Cache size also influences choice—small caches may benefit from full associativity, while larger caches typically adopt set‑associativity to limit conflicts.

5. Cache Mapping Case Study

5.1 Real‑World Example

Intel Core 2 L1/L2 caches use 64‑byte cache lines, 8‑way set associativity, 64 sets, yielding a 32 KB L1 cache. Each 4 KB memory page contains 64 cache lines.

5.2 Selecting Cache Set by Index

In a set‑associative cache, a given cache line can reside only in a specific set. The set index is derived from bits 6–11 of the physical address, determining which of the 64 sets stores the line. For example, address 0x800010a0 maps to set 2.

5.3 Searching for Matching Tag in a Set

All tags within the selected set are compared in parallel; a matching valid tag yields a cache hit, otherwise the request proceeds to L2 or main memory. Expanding the cache (e.g., increasing ways to 8 and sets to 4096) can grow L2 to 4 MB, requiring 18‑bit tags and 12‑bit set indices.

When a set becomes full, an existing line must be evicted before a new line can be stored. Programs can mitigate evictions by arranging data to distribute accesses evenly across sets. For instance, an array with 512‑byte elements spaced 4 KB apart may cause repeated conflicts in a single set.

Cache lines also store state information (MESI protocol). L1 code cache lines are either invalid or shared; L1 data and L2 cache lines can be Modified, Exclusive, Shared, or Invalid. Intel caches are inclusive: contents of L1 are duplicated in L2.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CacheComputer Architecturememory mappingFully AssociativeSet AssociativeDirect Mapping
Linux Kernel Journey
Written by

Linux Kernel Journey

Linux Kernel Journey

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.