Fundamentals 74 min read

Unlocking Computer Fundamentals: From CPU Basics to Assembly Language Explained

Explore the essential building blocks of modern computing, covering CPU architecture, memory hierarchy, binary operations, compression techniques, operating system fundamentals, and assembly language, with clear explanations, diagrams, and code examples that demystify how hardware and software interact at the lowest level.

macrozheng
macrozheng
macrozheng
Unlocking Computer Fundamentals: From CPU Basics to Assembly Language Explained

CPU

Every programmer dreams of becoming a "big shot," but focusing only on frameworks overlooks the essential foundations of computing. Understanding the CPU, the core component of a computer, is crucial for long‑term growth.

CPU Internal Process

The CPU fetches, decodes, and executes instructions in three stages: fetch, decode, and execute. It extracts instructions from main memory, decodes their meaning, and then performs the required operation.

In this process the CPU interprets the final machine‑language code.

The CPU consists of two main parts: the Control Unit and the Arithmetic Logic Unit (ALU) .

Control Unit: extracts and decodes instructions from memory.

ALU: performs arithmetic and logical operations.

The CPU is the computer’s brain and works together with memory, I/O devices, and registers such as the Program Counter, Control Unit, ALU, and Clock.

Registers

Register

Function

Accumulator

Stores running data and results of calculations.

Flag Register

Reflects the processor’s state and results of operations.

Program Counter

Holds the address of the next instruction to execute.

Base Register

Stores the start address of a memory segment.

Index Register

Stores an offset relative to the base address.

General‑Purpose Register

Stores arbitrary data.

Instruction Register

Holds the currently executing instruction (cannot be accessed directly by programmers).

Stack Register

Points to the start of the stack area.

Only the Program Counter, Accumulator, Flag Register, Instruction Register, and Stack Register exist as a single instance; other registers usually have multiple copies.

Program Counter

The Program Counter (PC) stores the address of the next instruction. When a program starts, the PC points to the first instruction (e.g., address 0100). After each instruction the PC increments by 1, unless a jump instruction changes its value.

Memory

Memory (RAM) is the primary storage that the CPU uses to read and write data during program execution. It is also called the main memory.

Memory is built from many integrated circuits and comes in three major types:

RAM (Random Access Memory) – volatile, loses data when power is removed.

ROM (Read‑Only Memory) – non‑volatile, data persists without power.

Cache – a small, fast memory (L1, L2, L3) placed between CPU and RAM.

Memory Operations

To write a byte to memory, the CPU activates VCC (+5 V) and GND (0 V), selects the address using pins A0‑A9, places the data on D0‑D7, and sets the WR (write) signal to 1. To read, the address is set and the RD (read) signal is set to 1.

Virtual Memory and Disk Interaction

When RAM is insufficient, the operating system uses part of the disk as "virtual memory" (page file). Windows uses a paging system with 4 KB pages; data is swapped between RAM and disk as needed.

Binary

Computers use binary (base‑2) numbers to represent all data. Each bit is either 0 or 1, and the value of a binary number is calculated using powers of two.

For example, the binary

00100111

equals decimal 39 (32 + 4 + 2 + 1).

Shift Operations and Two’s Complement

Left shift adds zeros on the right; right shift can be logical (fills with zeros) or arithmetic (fills with the sign bit). Two’s complement is used to represent negative numbers: invert all bits and add 1.

Compression Algorithms

Compression reduces file size by removing redundancy. Two main categories are lossless (e.g., RLE, Huffman, LZW) and lossy (e.g., JPEG, MPEG). Compression can also be symmetric (encoding and decoding have similar complexity) or asymmetric.

Run‑Length Encoding (RLE)

RLE stores a character followed by its repeat count. The string

AAAAAABBCDDEEEEEF

becomes

A6B2C1D2E5F1

, achieving a 30 % compression ratio.

Huffman Coding

Huffman builds a binary tree based on symbol frequencies, assigning shorter codes to more frequent symbols. For the same example, Huffman can compress the data to 5 bytes (40 bits), a 71 % reduction.

File Type

Before

After

Compression Ratio

Text

14862 bytes

4119 bytes

28 %

Image

96062 bytes

9456 bytes

10 %

EXE

24576 bytes

4652 bytes

19 %

Operating System

An OS abstracts hardware and provides APIs for applications. Windows, Linux, and macOS each expose different system calls, making direct porting non‑trivial.

Key OS features include:

32‑bit and 64‑bit versions.

Win32 API (for 32‑bit) and Win64 API (for 64‑bit).

Graphical User Interface (GUI).

WYSIWYG printing.

Multitasking via time‑slicing.

Network and database middleware.

Plug‑and‑Play device driver installation.

Process Creation and API Calls

Applications call OS services through APIs such as

MessageBox()

(found in

user32.dll

). Different OSes have different APIs, which is why Windows programs cannot run unchanged on Linux.

Assembly Language and Native Code

CPU can only execute native (machine) code. Assembly language uses mnemonic opcodes (e.g.,

mov

,

add

) and operands to represent machine instructions, making them readable for humans.

Typical assembly syntax:

opcode destination, source

. For example,

mov eax, ebx

copies the value of

ebx

into

eax

.

Compiling C to Assembly (Borland C++ 5.5 Example)

<code>// Sample C code
int AddNum(int a, int b) {
    return a + b;
}

void MyFunc() {
    int c;
    c = AddNum(123, 456);
}
</code>

Compiling with

bcc32 -c -S Sample4.c

produces

Sample4.asm

:

<code>_AddNum proc near
    push ebp
    mov ebp, esp
    mov eax, dword ptr [ebp+8]
    add eax, dword ptr [ebp+12]
    pop ebp
    ret
_AddNum endp

_MyFunc proc near
    push ebp
    mov ebp, esp
    push 456
    push 123
    call _AddNum
    add esp, 8
    pop ebp
    ret
_MyFunc endp
</code>

Function Call Mechanism

When

MyFunc

calls

AddNum

:

Arguments are pushed onto the stack (right‑to‑left order).

call _AddNum

pushes the return address and jumps to

AddNum

.

AddNum

computes the result in

eax

and executes

ret

, which pops the return address.

After the call, the caller cleans the argument space (e.g.,

add esp, 8

).

Registers Used in Calls

ebp

– frame pointer, points to the base of the current stack frame.

esp

– stack pointer.

eax

– accumulator, holds return values.

ebx, ecx, edx, esi, edi

– general‑purpose registers.

Global vs. Local Variables

Global variables are placed in the

_DATA

(initialized) or

_BSS

(uninitialized) sections. Local variables reside in registers when possible; otherwise they are allocated on the stack.

<code>_DATA segment dword public use32 'DATA'
    _a1 dd 1
    _a2 dd 2
    _a3 dd 3
    _a4 dd 4
    _a5 dd 5
_DATA ends

_BSS segment dword public use32 'BSS'
    _b1 db 4 dup(?)
    _b2 db 4 dup(?)
    _b3 db 4 dup(?)
    _b4 db 4 dup(?)
    _b5 db 4 dup(?)
_BSS ends
</code>

In a function, the compiler may allocate up to five integer locals to registers (

eax, edx, ecx, ebx, esi

) and the rest to stack slots like

[ebp‑4]

,

[ebp‑8]

, etc.

Control Flow: Loops and Branches

A C

for

loop such as:

<code>for (int i = 0; i < 10; ++i) {
    MySub();
}
</code>

is compiled to assembly using

xor ebx, ebx

(initialize

i

to 0),

call _MySub

,

inc ebx

,

cmp ebx, 10

, and

jl short L4

to repeat while

i &lt; 10

.

Conditional statements use

cmp

followed by a conditional jump (

jle

,

jge

,

jmp

) to select the appropriate branch.

Multithreading Pitfalls

When multiple threads modify a shared global variable without synchronization, race conditions can occur. For example, two threads executing:

<code>counter *= 2;
</code>

may both read the original value before either writes back, resulting in only a single multiplication. Proper locking or atomic operations are required to avoid this.

Recommended Reading

以后要是再写for循环,我就捶自己!

写代码有这些想法,同事才不会认为你是复制粘贴程序员!

推荐一个项目管理工具,落地基于Scrum的敏捷开发

你还在代码里做读写分离么,试试这个中间件吧!

MySql主从复制,从原理到实践!

前后端分离项目,如何优雅实现文件存储!

2019 我的 Github 开源之路!

Github标星25K+Star,SpringBoot实战电商项目mall出SpringCloud版本啦!

涵盖大部分核心组件使用的 Spring Cloud 教程,一定要收藏哦!

我的Github开源项目,从0到20000 Star!

欢迎关注,点个在看

CPUassemblyOperating SystemMemoryfundamentalscompressionbinary
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.