Fundamentals 34 min read

What Happens Inside an ELF File? From Source Code to a Running Process

This article explains how a C/C++ program is compiled into an ELF executable, details the ELF file structure, describes the kernel's validation and segment mapping, covers dynamic linking, and shows how the process starts execution from the _start entry point, providing practical commands and debugging tips.

Deepin Linux
Deepin Linux
Deepin Linux
What Happens Inside an ELF File? From Source Code to a Running Process

1. ELF Overview

ELF (Executable and Linkable Format) is the standard binary format for Linux executables, object files ( .o) and shared libraries ( .so). The kernel recognises only ELF files for loading and execution.

1.1 Simple program

#include <stdio.h>
int main(){
    printf("Hello Linux
");
    return 0;
}

Compile with gcc test.c -o a.out. The resulting a.out is an ELF file; file a.out shows architecture, type, interpreter and other metadata.

2. ELF File Structure

2.1 File Header

File Header resides at the beginning of the file and contains the magic number (0x7F 'E' 'L' 'F'), file type, machine architecture, entry address, version and offsets to the program and section header tables. The kernel uses this information to verify the file before loading.

2.2 Program Header Table

The Program Header Table describes loadable segments (type PT_LOAD) that the kernel maps into the process address space. Each entry specifies virtual address, file offset, size and permission flags (r‑x, rw‑, etc.).

2.3 Section Header Table

The Section Header Table lists static sections used by linkers and analysis tools: .text (machine code), .rodata (read‑only constants), .data (initialized globals), .bss (zero‑filled uninitialized globals), .dynamic, .symtab, .debug_*, etc.

.text : executable code, mapped r‑x.

.rodata : constant data, mapped r‑.

.data : initialized variables, mapped rw‑.

.bss : uninitialized variables, occupies no file space; kernel allocates zero‑filled memory.

3. Building an ELF Executable

3.1 Compilation pipeline

Pre‑processing : gcc -E test.c -o test.i expands #include directives and macros.

Compilation & assembly : gcc -c test.i -o test.o produces an ELF relocatable object file.

Linking : gcc test.c -o a.out (or ld test.o -lc -o a.out) resolves symbols, applies relocations and creates the final ELF executable.

The resulting file can be inspected with readelf -h a.out and executed with ./a.out.

3.2 Linking types

Static linking copies all required code into the executable, producing a larger but self‑contained binary. Dynamic linking leaves external references (e.g., printf) and records needed libraries in the .dynamic section. At runtime the dynamic linker ( /lib64/ld-linux-x86-64.so.2) loads those libraries and resolves symbols via the PLT/GOT mechanism, often using lazy binding.

4. Loading and Execution

4.1 execve and kernel validation

When ./a.out is entered, the shell creates a child process and calls execve(). The kernel reads the ELF header, verifies the magic number and machine type, then reads the program header table to locate the entry point and loadable segments.

4.2 Segment mapping

LOAD segment 1 : contains .text and .rodata, mapped with permissions r‑x.

LOAD segment 2 : contains .data and .bss, mapped with permissions rw‑.

The .bss segment has no bytes in the file; the kernel allocates zero‑filled pages for it.

4.3 Dynamic linking (if applicable)

Parse the .dynamic section to obtain needed libraries (e.g., libc.so.6).

Load each library and map its loadable segments.

Resolve symbols via the PLT/GOT; the first call to a function triggers lazy binding.

Use ldd a.out to list required shared libraries.

4.4 Entry point to main

The real entry point is the symbol _start supplied by the C runtime. It sets up the stack, registers and calls __libc_start_main(), which in turn invokes user main. After main returns, exit() terminates the process.

5. Inspecting ELF Files

5.1 readelf

Common commands: readelf -h <filename> – file header. readelf -S <filename> – section header table. readelf -l <filename> – program header table. readelf -a <filename> – all available information.

These commands reveal magic number, entry address, segment permissions and the list of sections such as .debug_info or .symtab.

5.2 objdump

Disassembly examples: objdump -d a.out – disassemble executable sections (typically .text). objdump -D -M intel a.out – full disassembly in Intel syntax.

Disassembly shows the exact machine instructions that the kernel will execute after mapping.

5.3 Debugging with GDB

Compile with -g to embed debugging symbols. Typical workflow:

# Compile with debug info
gcc -g crash.c -o crash_demo
ulimit -c unlimited   # enable core dumps
./crash_demo          # cause a crash
# Analyse core file
gdb ./crash_demo core
bt    # backtrace
list  # show source around the fault

GDB uses the ELF symbol table and .debug_* sections to map addresses back to source lines, allowing precise fault localisation.

6. ELF‑Based Debugging Techniques

6.1 Common methods

Set breakpoints, step through instructions, inspect registers and memory, and switch between threads in multithreaded programs.

6.2 Leveraging ELF information

The symbol table ( .symtab) and debugging sections ( .debug_info, .debug_line) let debuggers translate raw addresses into function names and source lines, which is essential for diagnosing crashes, memory leaks or mismatched library versions.

ELF loading diagram
ELF loading diagram
DebuggingCompilationLinuxELFLinkingSystems ProgrammingExecutable
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.