Fundamentals 28 min read

What Your CPU Actually Does All Day: The Four Core Tasks

The article explains that a CPU continuously repeats four stages—fetch, decode, execute, and write‑back—while highlighting the role of registers, the differences between CISC and RISC instruction sets, how function calls build and tear down stack frames, and why user‑mode coroutines are far lighter than kernel threads.

IT Services Circle

May 9, 2026

What Your CPU Actually Does All Day: The Four Core Tasks

CPU instruction execution cycle

CPU runs an infinite loop of four stages: Fetch → Decode → Execute → Write‑Back. Fetch reads the next instruction from memory into the instruction register (IR) and increments the program counter (PC). Decode interprets the opcode and identifies operand registers. Execute performs the operation in the ALU, load/store unit or branch unit. Write‑Back stores the result back to a register or memory. The clock drives each stage, which is why GHz measures CPU speed.

Registers – the fastest storage inside the core

General‑purpose registers (RAX, RBX, RCX, RDX, …) hold data and addresses.

Instruction pointer (RIP/PC) points to the next instruction.

Stack pointer (RSP) points to the current top of the stack.

Base pointer (RBP) marks the base of the current stack frame.

Flags register (RFLAGS) stores status bits such as zero and carry.

Segment registers (CS, DS, SS) are largely legacy in 64‑bit mode.

Access latency is <1 ns, compared with 50‑100 ns for main memory because registers reside on‑chip and avoid the memory bus.

CISC vs RISC instruction sets

Complex Instruction Set Computing (CISC) – e.g., Intel x86/AMD64 – provides hundreds to thousands of variable‑length instructions (1‑15 bytes) that can perform multi‑step work in a single opcode. Reduced Instruction Set Computing (RISC) – e.g., ARM, RISC‑V, MIPS – provides a few dozen fixed‑length (usually 4 bytes) instructions, each doing one simple operation.

Example – increment a memory location:

inc dword ptr [rax]          // CISC: read, add, write in one instruction

ldr w0, [x1]                // RISC: load
add w0, w0, #1             // RISC: add
str w0, [x1]                // RISC: store

Modern x86 CPUs internally translate CISC instructions into RISC‑style micro‑ops (μops) before execution, achieving a “CISC outside, RISC inside” design.

Function calls and stack frames (x86‑64 System V ABI)

When a function is invoked the CPU performs a deterministic sequence of six steps (T1‑T5) that build and later dismantle a stack frame.

T1 – Parameter passing : the first six integer arguments are placed in registers RDI, RSI, RDX, RCX, R8, R9. In the example add(3,4), a=3 → RDI and b=4 → RSI. The stack is unchanged at this point.

T2 – call instruction : the CPU pushes the return address (the address of the instruction after call) onto the stack and jumps to the callee entry point.

T3 – Prologue : push rbp saves the caller’s base pointer; mov rbp, rsp establishes a new frame base for the callee.

T4 – Allocate locals : sub rsp, N reserves space for local variables (e.g., c).

T5‑a – Execute body : perform the addition; the result is placed in RAX, the conventional integer return‑value register.

T5‑b – Epilogue : leave (equivalent to mov rsp, rbp; pop rbp) restores the caller’s stack pointer and base pointer; ret pops the return address into RIP, returning to the caller.

Typical stack‑frame layout (high address → low address):

+-------------------+  ← caller‑passed arguments (if any beyond the first six)
| Return address    |
+-------------------+  ← saved old RBP
| Saved RBP         |
+-------------------+  ← callee‑saved non‑volatile registers
| Local variables   |
+-------------------+  ← space for outgoing arguments
| ...               |
+-------------------+  ← stack grows downward

Common pitfalls:

Assuming all arguments are passed on the stack – on 64‑bit platforms the first six go via registers.

Returning a pointer to a local variable – the stack frame disappears after ret, leaving a dangling pointer.

Coroutines – user‑mode context switching

Key insight: a coroutine saves the current stack pointer, base pointer, instruction pointer and a handful of registers, then restores another coroutine’s saved state – essentially the same mechanism as a function call but applied to a larger execution context.

Implementation steps (illustrated by Go’s goroutine):

Allocate a small independent stack (initially 2 KB, grows on demand).

When a yield occurs, store RSP, RBP, RIP and selected registers into the goroutine’s control block.

Restore the next goroutine’s saved values; the CPU continues execution at the saved RIP.

Advantages over kernel threads (expressed as a list instead of a table):

Trigger : thread switch on time‑slice expiry or blocking; coroutine switch on explicit yield / await.

Cost : thread switch requires a microsecond‑level kernel transition; coroutine switch costs only a few nanoseconds (a handful of mov instructions).

Stack size : threads allocate several megabytes; coroutines start with a few kilobytes and grow dynamically.

Maximum count : thousands of threads per process vs. millions of coroutines.

Thus, goroutine switches are orders of magnitude cheaper because they avoid system calls and TLB flushes.

Summary of the mental model

CPU executes code via the four‑stage cycle (Fetch‑Decode‑Execute‑Write‑Back), which forms the basis for pipelining, out‑of‑order execution and branch prediction.

Registers are ultra‑fast on‑chip storage used for all intermediate data and address manipulation.

High‑level code compiles to a sequence of register‑to‑register moves and ALU operations.

Function calls are implemented as stack‑frame construction, register‑based argument passing, return‑address handling and stack‑frame teardown.

Coroutines are essentially saved + restored execution contexts performed in user mode, making them far lighter than kernel threads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CPU Registers x86 Arm Coroutines Function Calls Instruction Execution

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.