How Does Linux’s Init Process Switch from Kernel to User Mode?
This article explains how the Linux init process (pid 1) transitions from kernel mode to user mode, detailing the role of kernel_thread, run_init_process, kernel_execve, the int 0x80 system call, register changes, and a practical experiment confirming the __USER_CS value.
Background: the init process in Linux
The first user‑space process in a Linux system is the init process (PID 1). It is created by the kernel using kernel_thread, which starts in kernel mode and then invokes the user‑space program /sbin/init. All other user processes are descendants of this init process.
How the kernel launches a user program
There are two kernel helpers that can start a user‑space program from kernel space:
call_usermodehelper kernel_execveBoth eventually issue an int $0x80 software interrupt, which invokes the sys_execve system call. The question is how the CPU switches from the kernel code segment ( __KERNEL_CS) to the user code segment ( __USER_CS) during this transition.
Key functions involved
During boot, start_kernel calls rest_init, which creates the first kernel thread that runs kernel_init. Inside kernel_init the function run_init_process is called with paths such as "/sbin/init", "/etc/init", etc. The core of run_init_process is a call to kernel_execve:
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
{
long __res;
asm volatile ("int $0x80"
: "=a" (__res)
: "0" (__NR_execve), "b" (filename),
"c" (argv), "d" (envp)
: "memory");
return __res;
}The inline assembly triggers the int $0x80 interrupt, placing the system‑call number ( __NR_execve) in eax, the filename in ebx, argv in ecx and envp in edx. The return value is also delivered in eax.
System‑call entry point (assembly)
The interrupt vector system_call is defined in arch/x86/kernel/entry_32.S. A simplified excerpt shows the handling of the system‑call number and the eventual jump through the system‑call table:
ENTRY(system_call)
RING0_INT_FRAME
pushl %eax # save original eax
SAVE_ALL
GET_THREAD_INFO(%ebp)
testl $_TIF_WORK_SYSCALL_ENTRY, TI_flags(%ebp)
jnz syscall_trace_entry
cmpl $nr_syscalls, %eax
jae syscall_badsys
call *sys_call_table(,%eax,4)
movl %eax, PT_EAX(%esp) # store return value
...
iretThe crucial line call *sys_call_table(,%eax,4) dispatches to the function whose index is the system‑call number. For execve ( __NR_execve = 11) the table entry points to sys_execve.
Implementation of sys_execve
long sys_execve(const char __user *name,
const char __user *const argv[],
const char __user *const envp[],
struct pt_regs *regs)
{
long error;
char *filename;
filename = getname(name);
error = PTR_ERR(filename);
if (IS_ERR(filename))
return error;
error = do_execve(filename, argv, envp, regs);
#ifdef CONFIG_X86_32
if (error == 0) {
/* ensure we return via sysenter */
set_thread_flag(TIF_IRET);
}
#endif
putname(filename);
return error;
}The function resolves the filename, checks for errors, and then calls do_execve. In the 32‑bit case it sets the TIF_IRET flag so that the return path uses iret, which will pop the saved cs and eip from the stack.
Transition to user mode
After do_execve loads the binary (for /sbin/init this is an ELF executable), it eventually calls start_thread:
void start_thread(struct pt_regs *regs,
unsigned long new_ip,
unsigned long new_sp)
{
set_user_gs(regs, 0);
regs->fs = __USER_DS;
regs->ds = __USER_DS;
regs->es = __USER_DS;
regs->ss = __USER_DS;
regs->cs = __USER_CS; /* switch to user code segment */
regs->ip = new_ip; /* entry point of the new program */
regs->sp = new_sp; /* user stack */
free_thread_xstate(current);
}Here the kernel explicitly overwrites regs->cs with __USER_CS (0x33 on x86‑64) and sets the instruction pointer to the program’s entry point ( elf_entry). When the iret instruction at the end of system_call executes, the CPU resumes execution in user mode at that address.
Verification experiment
A small user‑space program reads the cs register and logs it via syslog:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <syslog.h>
int main(void)
{
unsigned short ucs;
asm ("movw %%cs, %0" : "=r" (ucs) :: "memory");
syslog(LOG_INFO, "ucs = 0x%x
", ucs);
return 0;
}The binary is placed in /sys/kernel/uevent_helper and triggered by a hot‑plug event (e.g., inserting a USB drive). The kernel log shows:
Mar 10 14:20:23 build-server main: ucs = 0x330x33 matches the __USER_CS value on the tested x86‑64 system, confirming that the transition from kernel to user mode occurs by overwriting cs and ip before the final iret.
Conclusion
The Linux init process starts as a kernel thread, calls run_init_process, which invokes kernel_execve. The int $0x80 system call eventually reaches sys_execve, which prepares the new program and, via start_thread, manually sets regs->cs = __USER_CS and the new instruction pointer. The subsequent iret returns control to user space, completing the privilege‑level switch.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
