Fundamentals of Computer Architecture: CPU, Memory Hierarchy, Caches, and Compilers
This article provides a comprehensive overview of how computers operate, covering CPU instruction cycles, memory organization, endianness, compiler translation, operating‑system interaction, cache levels, storage tiers, and the principles of temporal and spatial locality that drive modern performance optimizations.
The piece begins by introducing the historical development of computers, from early mechanical devices to modern processors, and explains that all computers follow the Von Neumann model where a CPU executes instructions stored in memory.
It describes the basic components of a computer: the CPU (central processing unit) and RAM (random‑access memory). The CPU fetches instructions and data from RAM, uses registers for temporary storage, and repeatedly performs a fetch‑decode‑execute cycle controlled by the program counter (PC).
Memory is organized into addressable units, each identified by a binary address transmitted over address and data buses. Binary numbers are represented with high voltage for ‘1’ and low voltage for ‘0’, and the article illustrates how read and write operations are performed on individual bytes.
The instruction cycle is detailed: (1) fetch the instruction at the address in PC, (2) increment PC, (3) execute the instruction, and (4) repeat. The role of BIOS as the immutable first‑stage program is also mentioned.
Endianness is explained, contrasting big‑endian (most‑significant byte first) with little‑endian (least‑significant byte first) and noting the practical implications for data exchange across different architectures.
Compilers are introduced as programs that translate high‑level source code into machine instructions. Simple examples illustrate how a factorial function can be written recursively, then transformed into an iterative form, and how a compiler may optimise repeated expressions:
if x = 0
compute_this()
else
compute_that() function factorial(n)
if n > 1
return factorial(n - 1) * n
else
return 1 function factorial(n)
result ← 1
while n > 1
result ← result * n
n ← n - 1
return result t1 ← x + y
i ← t1 + 1
j ← t1The operating system’s role is covered, explaining that compiled programs must make system calls to perform I/O, and that different OSes expose different APIs, making binaries non‑portable across platforms.
Cache hierarchy is examined in depth. L1 cache (≈10 KB) sits inside the CPU and provides near‑register speed; L2 cache (≈200 KB) offers larger capacity with slightly higher latency; modern CPUs also include L3 cache, which is larger still but slower. The article shows how these caches dramatically reduce the proportion of memory accesses that must reach RAM.
Beyond caches, the storage hierarchy is discussed: RAM (first‑level volatile storage), disks/SSDs (second‑level), and network or archival storage (third‑level). It highlights the latency differences—from nanoseconds for CPU registers to milliseconds for disks and hundreds of milliseconds for remote cloud storage.
Temporal and spatial locality principles are introduced, explaining that programs tend to reuse recently accessed data and access nearby addresses, which justifies the use of multi‑level caches to keep hot data close to the CPU.
Finally, the article summarises that understanding and exploiting these hardware characteristics—CPU instruction cycles, memory organization, cache levels, and locality—are essential for writing efficient software and for system designers to bridge the growing gap between processor speed and memory latency.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.