Fundamentals 14 min read

Unlocking CPU Secrets: How Processors Execute, Cache, and Multithread

This article explains the core components and operation of a CPU, covering the fetch‑decode‑execute cycle, instruction sets, registers, pipeline and superscalar designs, multi‑core and hyper‑threading concepts, and the hierarchy of caches from registers to L3, providing a comprehensive fundamentals overview.

Programmer DD

Apr 30, 2020

Unlocking CPU Secrets: How Processors Execute, Cache, and Multithread

CPU is the brain of a computer.

1. Program execution process involves loading the program into memory, then the CPU repeatedly fetches an instruction, decodes it to determine its type and operands, and executes it, continuing until the program exits.

2. The fetch‑decode‑execute sequence constitutes the basic CPU cycle.

3. Instruction sets differ between CPU architectures; hardware instruction sets are provided by the CPU itself, while software instruction sets are offered by language libraries. Different architectures (e.g., x86 vs. ARM) cannot run each other's binaries.

4. Registers store key variables and temporary data because accessing memory is much slower than executing instructions. The CPU provides specific instructions to move data between memory and registers.

Arithmetic and logical operations such as addition, subtraction, NOT, AND, OR are supported directly, while multiplication and division are implemented via more complex sequences, making them slower.

5. Special registers include:

PC (Program Counter): holds the address of the next instruction to fetch.

Stack Pointer: points to the top of the current stack, containing function frames, parameters, locals, and temporary variables.

PSW (Program Status Word): contains control bits such as CPU priority and mode (kernel or user).

6. Context switching saves the register state of the current process to memory and restores it when the process is resumed.

7. Pipelining separates the fetch, decode, and execute stages into independent units, allowing overlapping execution of multiple instructions.

8. Superscalar architecture further splits fetch‑decode‑execute into parallel pipelines, with multiple decode and execution units operating concurrently.

9. Kernel vs. user mode is controlled by a bit in the PSW register.

10. Kernel mode can execute all instructions and use all hardware features.

11. User mode restricts execution to a subset of instructions, disallowing I/O and memory protection operations.

12. System calls allow user‑mode code to request privileged operations, causing a trap into the kernel and later returning to user mode.

13. Hardware traps also cause the CPU to switch to kernel mode to handle exceptions.

Basic composition of a CPU

1. The CPU performs arithmetic such as c=a+b.

2. Operations involve input (a, b), processing (addition), and output (c).

3. The CPU uses various registers to store data:

MAR : Memory Address Register, holds the address of data to be accessed.

MDR : Memory Data Register, holds the data read from or to be written to memory.

AC : Accumulator, stores intermediate arithmetic or logical results.

PC : Program Counter, holds the address of the next instruction.

CIR : Current Instruction Register, holds the instruction currently being executed.

4. The Arithmetic Logic Unit (ALU) performs basic arithmetic and logical operations.

5. The Control Unit (CU) directs data between memory and the ALU and generates control signals.

The CU knows what operation to perform based on the instruction decoded.

Example: two MDR registers hold operands, which are moved to the ALU for addition; the result is stored back in an MDR and written to memory.

CPU multi‑core and multi‑threading

1. The number of physical CPUs is limited by motherboard slots; each CPU can have multiple cores, and each core can support multiple threads.

2. The OS treats each core as an independent CPU.

3. Hyper‑threading presents each core as multiple logical CPUs (e.g., 1 core → 2 threads).

4. Effective hyper‑threading requires OS optimizations.

5. Multi‑threaded CPUs are stronger than non‑threaded ones, but each thread is less powerful than a full core.

6. Threads on the same core share its resources; only one thread can use a given resource at a time.

7. Multi‑threading does not provide true parallel execution per core but allows overlapping pipeline stages, improving throughput.

8. Scheduling can lead to inefficiencies, such as two threads on the same core while another core remains idle.

CPU caches

1. Registers are the fastest cache (<1 ns latency, <1 KB capacity).

2. Cache hierarchy below registers includes L1, L2, and L3 caches, each larger and slower than the previous.

3. L1 cache is per‑core and split into instruction cache (L1‑icache) and data cache (L1‑dcache).

4. L2 cache stores recently used memory data, anticipating future accesses.

5. L2 may be per‑core or shared; L1 is private to each core but shared among its threads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cache multithreading CPU Registers Computer Architecture pipelining

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.