Fundamentals 5 min read

How Does Code Transform Into Machine Instructions? A Step‑by‑Step Compiler Guide

This article walks through how a compiler turns human‑readable source code into binary machine instructions, covering tokenization, parsing, abstract syntax tree construction, code generation, optimization, and linking, while highlighting the role of LLVM as a portable backend.

Liangxu Linux

Aug 20, 2024

How Does Code Transform Into Machine Instructions? A Step‑by‑Step Compiler Guide

Programs you write are just text strings; a compiler converts that text into binary instructions the CPU can execute. The process begins with a button in an IDE that triggers the compilation pipeline.

Tokenization

The compiler first scans the source string, discarding whitespace and line breaks, and groups characters into meaningful units called tokens (e.g., keywords like int and main, literals like 0 and 5).

Parsing and AST Construction

Next, the compiler parses the token stream according to the language grammar. For example, when it encounters an if token, it applies the if -statement grammar, building a hierarchical representation called an Abstract Syntax Tree (AST). The AST shows the structure: the if keyword, opening parenthesis, boolean expression, closing parenthesis, and the statement block.

Code Generation

The compiler then traverses the AST to emit instructions. In a simple example it may generate assembly that directly reflects the source logic.

Real compilers often do not emit final machine code at this stage; they produce intermediate representations that later back‑ends translate for specific CPU architectures (e.g., x86 vs. ARM).

Target‑Specific Back‑Ends and LLVM

If you design a new language, you must write a back‑end for each target architecture. LLVM solves this by providing a common intermediate language; the compiler emits LLVM IR, and LLVM handles the rest, generating optimized machine code for the chosen CPU.

Optimization

During code generation the compiler performs optimizations, such as removing dead code. In the example, an unused variable a is eliminated, so its assignment instruction disappears from the final output.

Object Files and Linking

Each compiled source file becomes an object file containing the binary instructions. A linker then combines all object files, resolves symbols, and produces the final executable program.

The article originally included a call to follow a public account for more linker details, but the technical explanation above stands on its own.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Code Generation AST compiler LLVM tokenization compilation process Linker

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.