Fundamentals 26 min read

Why Understanding CPU, Memory, and Threads Is Crucial for Modern Software

This article explains the fundamental concepts of how programs run on a computer, covering CPU, memory, I/O, system software layers, process scheduling, virtual memory, multithreading models, synchronization mechanisms, and common pitfalls that affect concurrency and performance.

21CTO

Aug 17, 2017

Why Understanding CPU, Memory, and Threads Is Crucial for Modern Software

1.1 Starting from Hello World

Purpose: From the most basic compilation and static linking to how the operating system loads programs, dynamic linking, runtime libraries, and standard library implementations, to understand the basic flow of program execution on a computer.

1.2 Core Components Remain Unchanged

The three most critical parts of a computer: CPU, memory, and I/O control chips.

Early computers had no complex graphics; CPU and memory ran at the same frequency on a single bus.

CPU frequency increased, memory could not keep up, leading to a system bus matching memory speed and CPU using a multiplier to communicate.

With the advent of graphical interfaces, graphics chips required heavy data exchange with memory and CPU, prompting the design of a fast north‑bridge chip and later a south‑bridge for slower devices such as disks, USB, and keyboards.

1.3 High Position, Broad View

System software generally manages the computer locally and is divided into two parts:

Platform components: OS kernel, drivers, runtime libraries.

Development tools: compilers, assemblers, linkers.

The system software architecture uses a layered structure where each layer communicates via interfaces; lower layers provide services, upper layers consume them. Apart from hardware and applications, everything else is an intermediate layer that abstracts the layer below, keeping applications relatively independent from hardware.

Both the OS runtime library and the application use the same OS Application Programming Interface (API). Windows provides the Windows API; Linux provides the POSIX API via the glibc runtime library.

1.4 What the Operating System Does

The OS provides abstract interfaces and manages hardware resources (CPU, memory, and I/O devices).

1.4.1 Prevent the CPU from Idling

Multiprogramming : a monitor program runs other waiting programs when the current one does not need the CPU, but it lacks priority handling. Time‑sharing systems give each program a time slice, forming the early OS prototype. Modern OSes use multitasking , where the OS controls all hardware resources and runs applications as processes with isolated address spaces. CPU allocation is preemptive , and short time slices can create the illusion of parallelism (macro‑parallel, micro‑serial).

Device Drivers

The OS abstracts hardware and presents a uniform access model to runtime libraries and applications. In UNIX, devices are accessed like regular files; in Windows, graphics use GDI, audio uses DirectX, and disks appear as file systems. The OS’s hardware drivers handle the low‑level details.

Disk structure: multiple platters, each side divided into tracks and sectors (typically 512 bytes). LBA numbers sectors sequentially from 0.

1.5 What If Memory Is Insufficient

Early computers ran programs directly in physical memory, leading to problems:

Address spaces were not isolated, allowing malicious programs to corrupt others.

Low memory‑usage efficiency because the whole program had to be loaded; swapping caused heavy data movement.

Program load addresses were nondeterministic, requiring relocation.

A solution is the middle layer : treat program‑provided addresses as virtual addresses that map to physical memory, providing isolation.

1.5.1 Isolation

A 32‑bit address space spans 0x00000000–0xFFFFFFFF (4 GB). Physical space may be smaller (e.g., 512 MB). Each process gets its own virtual space, ensuring isolation.

1.5.2 Segmentation

Map a contiguous virtual region to a physical region, solving address‑space isolation and nondeterministic load addresses, but not memory‑usage efficiency.

1.5.3 Paging

Divide the address space into fixed‑size pages (commonly 4 KB). Frequently used pages stay in memory; others reside on disk and are loaded on demand. Pages can be virtual pages (VP) , physical pages , or disk pages . A page fault occurs when a needed page is on disk, prompting the OS to load it. Permissions protect pages, and the MMU translates virtual to physical addresses.

1.6 Many Hands Make Light Work

1.6.1 Thread Basics

Threads (lightweight processes) are the smallest unit of execution. Each thread has an ID, instruction pointer, register set, stack space, and access to process‑level resources (open files, signals).

What Is a Thread

A thread is a lightweight process and the minimal execution flow.

Thread vs. Process Resources

Thread‑private: thread ID, registers, stack, thread‑local storage. Shared across threads: global variables, heap, opened files, code.

Thread Scheduling and Priority

On both multiprocessor and single‑processor systems, threads run concurrently. When thread count < processor count, true concurrency occurs; otherwise, the OS time‑slices threads ( thread scheduling ). Threads have three states: running, ready, waiting. Time slices are called time slices . Scheduling combines priority scheduling and round‑robin . IO‑bound threads are called IO‑intensive ; CPU‑bound threads are CPU‑intensive . Starvation can occur for low‑priority threads, mitigated by priority aging.

Preemptive vs. Non‑Preemptive Threads

Preemptive threads are forced to yield after their time slice. Early systems used non‑preemptive threads, which voluntarily yielded when waiting for I/O or explicitly.

Linux Threads

Linux treats all execution entities as tasks . Each task looks like a single‑threaded process; threads are tasks that share memory space.

System Calls

fork – duplicate the current task.

exec – replace the current image with a new executable.

clone – create a child task starting at a specified point (used for threads).

fork uses copy‑on‑write to share memory until a write occurs.

1.6.2 Thread Safety

Concurrent threads can modify shared global variables and heap data, so data consistency is critical.

Race Conditions and Atomic Operations

Incrementing a variable (e.g., ++i) involves read‑modify‑write; a single instruction that cannot be interrupted is called atomic . Windows provides Interlocked API for atomic operations.

Synchronization and Locks

Synchronization ensures that when one thread finishes accessing data, others cannot intervene. Locks are non‑mandatory mechanisms: a thread acquires a lock before accessing a resource and releases it afterward.

Semaphores and Mutexes

A binary semaphore (mutex) allows exclusive access; a counting semaphore permits up to N concurrent accesses.

Read‑Write Locks

Read‑write locks allow multiple readers or a single writer, improving efficiency for read‑heavy workloads.

Condition Variables

Threads can wait on a condition variable and be awakened when the condition occurs, enabling coordinated waiting.

Reentrancy and Thread‑Safe Functions

A reentrant function can be entered again before previous execution finishes without adverse effects. Requirements: no static or global non‑const data, no returning pointers to such data, rely only on caller‑provided arguments, avoid non‑reentrant calls, and no external locks.

Over‑Optimization Pitfalls

Even with locks, compiler optimizations (e.g., keeping a variable in a register) can break thread safety, leading to inconsistent results.

CPU Reordering and Barriers

CPU out‑of‑order execution can reorder instructions, affecting correctness. Memory barriers (e.g., POWERPC’s lwsync) prevent such reordering.

1.6.3 Thread Internals

Thread concurrency is achieved via multiple processors or OS scheduling. User‑level threads may not map one‑to‑one with kernel threads.

One‑to‑One Model

Each user thread maps to a unique kernel thread, providing true concurrency but limited by kernel thread count and context‑switch overhead.

Many‑to‑One Model

Multiple user threads share a single kernel thread; context switches are fast, but a blocked user thread blocks all.

Many‑to‑Many Model

Multiple user threads map to a pool of kernel threads, balancing concurrency and scalability.

Author: 目不识丁

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Memory Management Operating Systems CPU architecture

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.