Unlocking the Secrets of C Compilation: From Source to Executable
This article explains the fundamental concepts behind C language compilation, covering lexical, syntax, and semantic analysis, GCC options, file types, static and dynamic linking, ELF structure, and the loader process that turns source code into a runnable program.
Reference List
C Fast Track (0) C Family History
C Fast Track (1) HelloWorld
C Fast Track (2) Basic Data Types
C Fast Track (3) Pointer Types
C Fast Track (4) Arrays and Strings
C Fast Track (5) Structures and Bitfields
C Fast Track (6) Enums and Unions
C Fast Track (7) Variables, Constants and Scope
C Fast Track (8) Operators and Control Flow
C Fast Track (9) Functions and Macros
The Essence of Language
In the 1950s Noam Chomsky defined a series of theories that language is built on recursive and repetitive sub‑structures. These theories underpin our modern understanding of language structure and imply that a finite set of grammatical rules can describe an infinite number of sentences.
The same principle applies when learning or designing a programming language: the first concern is its grammar and semantics.
How Compilers Work
High‑level languages rely on a compiler to produce binary executables. The compiler can be divided at the intermediate‑code boundary into front‑end and back‑end processing.
Front‑end : lexical analysis, syntax analysis, semantic analysis.
Back‑end : target code generation, code optimizer, output executable.
The front‑end’s main job is to produce an abstract syntax tree (AST) for the back‑end.
Lexical Analysis
Lexical analysis (scanning) is the first stage, performed by a lexer. It converts the source file into a stream of lexical units (tokens) such as identifiers, keywords, operators, and constants.
Lexers typically use regular expressions and finite‑state machines to produce a token list for later stages.
This stage can also remove comments, detect identifiers, numbers, etc., centralising language‑specific rules.
Syntax Analysis
Syntax analysis (parsing) is the second stage, performed by a parser. It transforms the token list into an abstract syntax tree (AST) and checks syntactic correctness, often using context‑free grammars and top‑down or bottom‑up algorithms.
The AST provides a representation that is easier for later stages to process.
Semantic Analysis
Semantic analysis is the third stage, performed by the compiler. It examines the AST to understand code meaning, checks types, scopes, declarations, constraints, and may generate intermediate or target code. Symbol tables and type checkers are commonly used.
The semantic analyzer can also transform the AST into a behavior tree for further optimisation.
GCC Compiler Suite
GCC (GNU Compiler Collection) is the most widely used C/C++ compiler on Linux, released under the GPL and a core part of the GNU project.
GCC supports many CPU architectures (x86, ARM, MIPS, etc.) and is the standard C compiler on most Unix‑like operating systems. It is invoked from the command line with the gcc command.
Common Options
-g : generate debugging symbols for GDB.
-L : specify library search path.
-l : specify library name.
-o <filename> : name of the output executable.
-O : enable optimisation.
-O2 : stronger optimisation.
-c : compile only, do not link.
-v : verbose output.
-Wall : enable all warnings.
Common File Types
.c : C source file.
.h : header file.
.i : preprocessed source file.
.o : object file.
.a : static library archive.
.so : shared library.
.s : assembly file.
.S : preprocessed assembly file.
C Program Compilation Flow
The gcc command performs four core steps:
Preprocessing (handled by cpp).
Compilation (handled by gcc).
Assembly (handled by as from Binutils).
Linking (handled by ld from Binutils).
1. Preprocessing
cppprocesses directives such as #include and #define, expands macros, removes comments, adds line numbers, and retains #pragma directives. gcc -E -I . hello.c -o hello.i -E : stop after preprocessing.
-I : specify include directory.
-o : output file name.
Opening hello.i shows the fully expanded source.
2. Compilation
The compilation stage performs lexical, syntax, and semantic analysis and generates assembly code. gcc -S -I . hello.i -o hello.s -S : stop after compilation (produce assembly).
3. Assembly
astranslates assembly into machine code, producing .o object files. When a program consists of multiple source files, each is assembled into its own object file before linking.
$ gcc -c hello.s -o hello.o
# or $ as hello.s -o hello.o
$ file hello.o
hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped4. Linking
Linking resolves references between modules and produces an ELF executable. Two linking modes exist:
Static linking : uses .a archives; requires the -static option. gcc -static hello.c -o hello Dynamic linking : uses .so shared libraries; this is the default mode.
gcc hello.c -o helloStatic Linking
Static libraries ( lib*.a) contain one or more object files. The linker extracts needed objects and incorporates them into the final ELF, making the executable portable but larger.
Static linking performs three core tasks: address and storage allocation, symbol resolution, and relocation.
LD scans the .a archive, gathering symbol tables and sections.
LD merges all tables and sections globally.
LD resolves symbols, adjusts sections, and builds new address mappings.
Dynamic Linking
Dynamic libraries ( lib*.so) are shared objects that can be used by multiple programs. The ELF contains references to these libraries; the actual loading occurs at runtime.
Dynamic linking reduces executable size and memory usage but introduces environment dependencies. Use ldd to list an executable’s shared‑library dependencies.
ldd {executable}Managing Shared Libraries
Shared libraries use versioned filenames (e.g., libc.so.6 → libc-2.17.so). The system searches for libraries in the following order:
Directories specified by -L options.
Paths in the LIBRARY_PATH environment variable.
Paths known to ldconfig.
Default system directories ( /usr/lib, /usr/lib64).
The ldconfig tool manages library caches and symbolic links, using configuration files such as /etc/ld.so.conf, /etc/ld.so.cache, and /etc/ld.so.preload. Common ldconfig options include -v, -n, -N, -X, -f, -C, -r, -p, and -V.
The LD_LIBRARY_PATH environment variable can add additional search paths, while LD_PRELOAD forces loading of specific libraries.
Program Loading and Execution
Running a program involves two phases:
Obtain an executable through preprocessing, compilation, assembly, and linking.
The program loader ( /lib/ld-linux.so.X) loads the ELF into memory; the CPU then starts executing at main().
After the linker produces an ELF file, the loader simply maps the ELF sections into the process’s virtual address space.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
