What Happens Inside an ELF File? From Source Code to a Running Process
This article explains how a C/C++ program is compiled into an ELF executable, details the ELF file structure, describes the kernel's validation and segment mapping, covers dynamic linking, and shows how the process starts execution from the _start entry point, providing practical commands and debugging tips.
1. ELF Overview
ELF (Executable and Linkable Format) is the standard binary format for Linux executables, object files ( .o) and shared libraries ( .so). The kernel recognises only ELF files for loading and execution.
1.1 Simple program
#include <stdio.h>
int main(){
printf("Hello Linux
");
return 0;
}Compile with gcc test.c -o a.out. The resulting a.out is an ELF file; file a.out shows architecture, type, interpreter and other metadata.
2. ELF File Structure
2.1 File Header
File Header resides at the beginning of the file and contains the magic number (0x7F 'E' 'L' 'F'), file type, machine architecture, entry address, version and offsets to the program and section header tables. The kernel uses this information to verify the file before loading.
2.2 Program Header Table
The Program Header Table describes loadable segments (type PT_LOAD) that the kernel maps into the process address space. Each entry specifies virtual address, file offset, size and permission flags (r‑x, rw‑, etc.).
2.3 Section Header Table
The Section Header Table lists static sections used by linkers and analysis tools: .text (machine code), .rodata (read‑only constants), .data (initialized globals), .bss (zero‑filled uninitialized globals), .dynamic, .symtab, .debug_*, etc.
.text : executable code, mapped r‑x.
.rodata : constant data, mapped r‑.
.data : initialized variables, mapped rw‑.
.bss : uninitialized variables, occupies no file space; kernel allocates zero‑filled memory.
3. Building an ELF Executable
3.1 Compilation pipeline
Pre‑processing : gcc -E test.c -o test.i expands #include directives and macros.
Compilation & assembly : gcc -c test.i -o test.o produces an ELF relocatable object file.
Linking : gcc test.c -o a.out (or ld test.o -lc -o a.out) resolves symbols, applies relocations and creates the final ELF executable.
The resulting file can be inspected with readelf -h a.out and executed with ./a.out.
3.2 Linking types
Static linking copies all required code into the executable, producing a larger but self‑contained binary. Dynamic linking leaves external references (e.g., printf) and records needed libraries in the .dynamic section. At runtime the dynamic linker ( /lib64/ld-linux-x86-64.so.2) loads those libraries and resolves symbols via the PLT/GOT mechanism, often using lazy binding.
4. Loading and Execution
4.1 execve and kernel validation
When ./a.out is entered, the shell creates a child process and calls execve(). The kernel reads the ELF header, verifies the magic number and machine type, then reads the program header table to locate the entry point and loadable segments.
4.2 Segment mapping
LOAD segment 1 : contains .text and .rodata, mapped with permissions r‑x.
LOAD segment 2 : contains .data and .bss, mapped with permissions rw‑.
The .bss segment has no bytes in the file; the kernel allocates zero‑filled pages for it.
4.3 Dynamic linking (if applicable)
Parse the .dynamic section to obtain needed libraries (e.g., libc.so.6).
Load each library and map its loadable segments.
Resolve symbols via the PLT/GOT; the first call to a function triggers lazy binding.
Use ldd a.out to list required shared libraries.
4.4 Entry point to main
The real entry point is the symbol _start supplied by the C runtime. It sets up the stack, registers and calls __libc_start_main(), which in turn invokes user main. After main returns, exit() terminates the process.
5. Inspecting ELF Files
5.1 readelf
Common commands: readelf -h <filename> – file header. readelf -S <filename> – section header table. readelf -l <filename> – program header table. readelf -a <filename> – all available information.
These commands reveal magic number, entry address, segment permissions and the list of sections such as .debug_info or .symtab.
5.2 objdump
Disassembly examples: objdump -d a.out – disassemble executable sections (typically .text). objdump -D -M intel a.out – full disassembly in Intel syntax.
Disassembly shows the exact machine instructions that the kernel will execute after mapping.
5.3 Debugging with GDB
Compile with -g to embed debugging symbols. Typical workflow:
# Compile with debug info
gcc -g crash.c -o crash_demo
ulimit -c unlimited # enable core dumps
./crash_demo # cause a crash
# Analyse core file
gdb ./crash_demo core
bt # backtrace
list # show source around the faultGDB uses the ELF symbol table and .debug_* sections to map addresses back to source lines, allowing precise fault localisation.
6. ELF‑Based Debugging Techniques
6.1 Common methods
Set breakpoints, step through instructions, inspect registers and memory, and switch between threads in multithreaded programs.
6.2 Leveraging ELF information
The symbol table ( .symtab) and debugging sections ( .debug_info, .debug_line) let debuggers translate raw addresses into function names and source lines, which is essential for diagnosing crashes, memory leaks or mismatched library versions.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
