Understanding CPU Architecture: From Instruction Cycle to Multicore Caches
This article explains how a CPU executes programs through the fetch‑decode‑execute cycle, describes instruction sets, registers, pipelines, superscalar and multithreaded designs, and details the hierarchy of caches from registers up to L3, providing a comprehensive overview of modern processor fundamentals.
About CPU and Program Execution
CPU is the brain of a computer. The program execution process is actually the execution of a large number of instructions, both those involved and those not directly involved.
When a program is loaded into memory, the CPU fetches an instruction, decodes it to determine its type and operands, executes it, and then repeats this fetch‑decode‑execute cycle until the program terminates.
The three steps of fetch, decode, and execute constitute a basic CPU cycle.
Each CPU has its own instruction set, which determines which programs it can run; for example, x86 CPUs cannot execute ARM programs and vice‑versa.
Registers are fast storage inside the CPU used to hold key variables and temporary data. The CPU provides specific instructions to move data between memory and registers and to perform basic arithmetic and logical operations; multiplication and division are slower because they are derived operations.
PC (Program Counter) stores the address of the next instruction to be fetched.
Stack Pointer points to the top of the current stack, holding function parameters, local variables, and temporary data.
PSW (Program Status Word) contains control bits such as CPU priority and mode (kernel or user).
During a context switch, the CPU saves the registers related to the current process to memory and restores them when the process is resumed.
Modern CPUs use separate units for fetching, decoding, and executing, forming a pipeline: while one unit executes instruction n, the previous unit decodes instruction n+1 and the fetch unit reads instruction n+2.
Superscalar architectures further duplicate these units, allowing multiple instructions to be fetched, decoded, and executed in parallel.
Basic CPU Components
The CPU performs arithmetic (e.g., c=a+b) and logical operations. It uses registers such as MAR (Memory Address Register), MDR (Memory Data Register), AC (Accumulator), PC, and CIR (Current Instruction Register) to hold addresses, data, and intermediate results.
The Arithmetic Logic Unit (ALU) carries out the actual calculations, while the Control Unit (CU) directs data between registers, the ALU, and memory based on instruction signals.
CPU Multicore and Multithreading
The number of physical CPUs is limited by motherboard sockets; each CPU can have multiple cores, and each core can support multiple threads.
Each core appears as an independent CPU to the operating system. In hyper‑threaded CPUs, each core presents multiple logical CPUs, sharing the core's resources.
Threads share a core's execution resources, so only one thread can use the core's execution engine at a time, but they can overlap in fetching, decoding, and executing instructions, improving overall throughput.
Effective multithreading requires OS support and optimization.
CPU Caches
The fastest storage is the CPU registers (<1 KB). Below registers are hierarchical caches: L1, L2, and L3. Each core has a private L1 cache, split into instruction cache (L1‑icache) and data cache (L1‑dcache). L2 caches store recently used memory data, and L3 caches are larger and may be shared among cores.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
