Unlocking Linux: How ELF Files Transform Into Running Processes
This article explains the ELF file format, its various types, internal structure, compilation and linking steps, and how the Linux kernel loads ELF binaries, creates processes with fork and exec, handles dynamic linking, relocation, and builds the process address space, providing developers and system engineers with deep insight into Linux execution.
ELF Files: The Soul Container of Linux
ELF (Executable and Linkable Format) is the standard binary format on Linux, analogous to Windows .exe files, and includes executable files, relocatable object files, shared objects (.so), and core dump files.
What Is an ELF File?
ELF files can be executables, relocatable object files, shared libraries, or core dumps, each serving different purposes in development and debugging.
ELF "Identity Card"
The first four bytes of an ELF file are the magic number 0x7f454c46, which the kernel checks to verify the file format.
Internal Structure of an ELF File
The ELF file consists of several key components:
File Header : Contains magic number, file type, target architecture, entry point address, and offsets to program and section header tables.
Program Header Table : Describes loadable segments (type PT_LOAD) with their file offsets, virtual addresses, sizes, and permissions.
Section Header Table : Provides information for linkers and debuggers about each section, such as .text, .data, .bss, .rodata, .symtab, and .strtab.
Sections are the units stored in the file, while segments are the runtime memory units created from one or more related sections.
From ELF File to Linux Process
Compilation and Linking
Source code is first compiled into object files (.o) using a compiler like GCC, then linked either statically or dynamically to produce an ELF executable.
#include <stdio.h>
int main() {
printf("Hello, World!
");
return 0;
}Static linking copies all required code into the final executable, while dynamic linking records references to shared libraries that are loaded at runtime.
Process Creation
Linux creates a new process with the fork() system call, which duplicates the parent process. The child then calls an exec family function (e.g., execve) to replace its memory image with the ELF binary.
#include <stdio.h>
#include <unistd.h>
int main() {
pid_t pid = fork();
if (pid == 0) {
printf("Child PID %d
", getpid());
} else {
printf("Parent PID %d, child %d
", getpid(), pid);
}
return 0;
}Loading the ELF File
When execve is invoked, the kernel reads the ELF header, verifies the magic number, and parses the program header table. For each PT_LOAD segment, the kernel maps the segment into the process's virtual address space using do_mmap, applying the appropriate permissions (read, write, execute). If a segment’s memory size exceeds its file size, the extra memory is zero‑initialized.
for (i = 0; i < phnum; i++) {
if (phdr[i].p_type == PT_LOAD) {
prot = 0;
if (phdr[i].p_flags & PF_R) prot |= PROT_READ;
if (phdr[i].p_flags & PF_W) prot |= PROT_WRITE;
if (phdr[i].p_flags & PF_X) prot |= PROT_EXEC;
do_mmap(file, phdr[i].p_offset, phdr[i].p_filesz, prot, MAP_PRIVATE, phdr[i].p_vaddr);
}
}Dynamic Linking
If the ELF file is dynamically linked, the kernel loads the interpreter specified in the .interp segment (e.g., /lib64/ld-linux-x86-64.so.2). The dynamic linker then resolves DT_NEEDED dependencies, loads required shared libraries, and performs lazy relocation using the Global Offset Table (GOT).
Process Address Space Layout
The loaded ELF defines the process’s memory layout: a read‑only executable code segment (.text), a writable data segment (.data), a zero‑filled BSS segment, a heap for dynamic allocation, a stack for function calls, and mapped regions for shared libraries.
Kernel Page Tables and MMU
The MMU translates the process’s virtual addresses to physical memory using page tables. Each page table entry maps a virtual page to a physical frame with appropriate access rights, enabling the CPU to fetch instructions and data from the correct locations.
Full Kernel Loading Flow
Initial ELF Inspection
The kernel opens the ELF file, reads the first 128 bytes to obtain the ELF header, and extracts the entry point, program header offset, and section header offset.
struct linux_binprm *bprm;
kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos);Mapping Segments
Using the program header table, the kernel maps each PT_LOAD segment into memory with do_mmap, setting protections based on PF_R, PF_W, and PF_X flags.
Loading the Dynamic Interpreter
The .interp segment provides the path to the dynamic linker, which the kernel loads similarly to a regular ELF.
if (phdr[i].p_type == PT_INTERP) {
// load dynamic linker
}Relocation and Symbol Resolution
The dynamic linker parses the .dynamic section, loads needed shared libraries, and applies relocations. Lazy binding defers symbol resolution until the first call.
Program Initialization and Start
After all segments and shared libraries are loaded and relocated, the kernel transfers control to the entry point address, and the program begins execution.
Practical Example
A simple C program that sums two numbers is compiled with gcc -o sum sum.c. Using readelf reveals the ELF header, program headers, and section headers, confirming the layout described above. Tracing execution with strace -f -o sum_trace.txt ./sum shows system calls such as fork(), execve(), and openat() for loading libc.so, illustrating the runtime behavior of ELF loading and process creation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
