Fundamentals of Computer Architecture, CPU, Memory Hierarchy and Compilers
This article explains the basic principles of how computers work, covering CPU and memory organization, instruction sets, endianness, compiler role, operating system interaction, cache levels, storage technologies, and performance optimization techniques.
To solve many problems, countless machines have been invented; computers range from Mars rovers to nuclear submarine controllers. Since von Neumann's 1945 model, almost all computers follow the same principles, and this article explores those fundamentals.
Architecture : A computer manipulates data according to instructions and consists mainly of a processor (CPU) and memory (RAM). Memory stores both instructions and data, while the CPU fetches them, executes operations, and writes results back.
Memory : Memory is divided into many addressable cells, each holding a small amount of data identified by a binary address. Access is performed via signal lines that transmit bits as high (1) or low (0) voltage.
Binary numbers use base‑2 representation.
Data is transferred over buses: an 8‑bit address bus selects a cell, and an 8‑bit data bus carries the byte to or from that cell. The CPU continuously exchanges data with RAM, fetching instructions and storing results.
CPU : The CPU contains registers for fast internal storage and can perform simple arithmetic, move data, and control program flow. Typical operations include copying data between memory locations and adding register values. The set of all possible operations is the instruction set, which the compiler translates from high‑level code into numeric codes stored in RAM.
Program execution follows a fetch‑decode‑execute cycle controlled by the program counter (PC): (1) fetch instruction at PC, (2) increment PC, (3) execute instruction, (4) repeat. The initial PC points to a BIOS routine that loads basic functionality.
CPU architectures differ (e.g., x86 vs. ARM), leading to different instruction sets and binary encodings. 32‑bit and 64‑bit architectures evolved to address larger memory spaces, and endianness (big vs. little) determines byte order in multi‑byte values.
Compilers : Programmers write code in high‑level languages, which compilers translate into CPU instructions. For example, a factorial function can be expressed recursively or iteratively, and the compiler may rewrite it for efficiency:
if x = 0
compute_this()
else
compute_that() function factorial(n)
if n > 1
return factorial(n - 1) * n
else
return 1 function factorial(n)
result ← 1
while n > 1
result ← result * n
n ← n - 1
return result i ← x + y + 1
j ← x + y t1 ← x + y
i ← t1 + 1
j ← t1Compiled programs must also interact with an operating system via system calls for I/O, file access, networking, etc. Different OSes provide incompatible system calls, so binaries compiled for Windows cannot run on macOS even on the same x86 CPU.
Optimization : Modern compilers apply hundreds of optimization rules to improve performance, such as eliminating redundant calculations or converting recursion to iteration.
Script Languages : Languages like JavaScript, Python, and Ruby are interpreted at runtime, which is slower than compiled code but allows rapid development without a separate compilation step.
Reverse Engineering : Disassemblers translate binary code back into human‑readable instructions, enabling analysis of software behavior, security research, and, unfortunately, piracy.
Open Source : Open‑source software allows anyone to inspect and modify source code, improving security transparency compared to closed‑source systems.
Memory Hierarchy : To bridge the speed gap between fast CPU registers and slower RAM, caches are introduced. Level‑1 cache (≈10 KB) is integrated into the CPU and is ~100× faster than RAM. Level‑2 cache (≈200 KB) is larger but slower, and many CPUs also include a Level‑3 cache. These caches exploit temporal and spatial locality to reduce costly RAM accesses.
Below the caches, RAM serves as the primary volatile storage (1–10 GB typical). When RAM fills, data is swapped to secondary storage (hard disks or SSDs). Hard disks are much slower (≈1 ms latency) compared to RAM (≈1 µs), and tertiary storage (tapes, optical media) is even slower, used mainly for archival.
Storage technology trends show SSDs replacing spinning disks for faster access, and hybrid drives combining SSD and HDD layers. Improving cache sizes and leveraging locality remain key strategies for performance.
Conclusion : Any computable task can be expressed as simple CPU instructions. Compilers translate high‑level code into these instructions, and performance depends heavily on the CPU‑memory hierarchy, cache utilization, and storage technology.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.