Fundamentals 13 min read

How Does Linux’s Init Process Switch from Kernel to User Mode?

This article explains how the Linux init process (pid 1) transitions from kernel mode to user mode, detailing the role of kernel_thread, run_init_process, kernel_execve, the int 0x80 system call, register changes, and a practical experiment confirming the __USER_CS value.

ITPUB
ITPUB
ITPUB
How Does Linux’s Init Process Switch from Kernel to User Mode?

Background: the init process in Linux

The first user‑space process in a Linux system is the init process (PID 1). It is created by the kernel using kernel_thread, which starts in kernel mode and then invokes the user‑space program /sbin/init. All other user processes are descendants of this init process.

How the kernel launches a user program

There are two kernel helpers that can start a user‑space program from kernel space:

call_usermodehelper
kernel_execve

Both eventually issue an int $0x80 software interrupt, which invokes the sys_execve system call. The question is how the CPU switches from the kernel code segment ( __KERNEL_CS) to the user code segment ( __USER_CS) during this transition.

Key functions involved

During boot, start_kernel calls rest_init, which creates the first kernel thread that runs kernel_init. Inside kernel_init the function run_init_process is called with paths such as "/sbin/init", "/etc/init", etc. The core of run_init_process is a call to kernel_execve:

int kernel_execve(const char *filename,
                 const char *const argv[],
                 const char *const envp[])
{
    long __res;
    asm volatile ("int $0x80"
                  : "=a" (__res)
                  : "0" (__NR_execve), "b" (filename),
                    "c" (argv), "d" (envp)
                  : "memory");
    return __res;
}

The inline assembly triggers the int $0x80 interrupt, placing the system‑call number ( __NR_execve) in eax, the filename in ebx, argv in ecx and envp in edx. The return value is also delivered in eax.

System‑call entry point (assembly)

The interrupt vector system_call is defined in arch/x86/kernel/entry_32.S. A simplified excerpt shows the handling of the system‑call number and the eventual jump through the system‑call table:

ENTRY(system_call)
    RING0_INT_FRAME
    pushl %eax            # save original eax
    SAVE_ALL
    GET_THREAD_INFO(%ebp)
    testl $_TIF_WORK_SYSCALL_ENTRY, TI_flags(%ebp)
    jnz syscall_trace_entry
    cmpl $nr_syscalls, %eax
    jae syscall_badsys
    call *sys_call_table(,%eax,4)
    movl %eax, PT_EAX(%esp)   # store return value
    ...
    iret

The crucial line call *sys_call_table(,%eax,4) dispatches to the function whose index is the system‑call number. For execve ( __NR_execve = 11) the table entry points to sys_execve.

Implementation of sys_execve

long sys_execve(const char __user *name,
                const char __user *const argv[],
                const char __user *const envp[],
                struct pt_regs *regs)
{
    long error;
    char *filename;
    filename = getname(name);
    error = PTR_ERR(filename);
    if (IS_ERR(filename))
        return error;
    error = do_execve(filename, argv, envp, regs);
#ifdef CONFIG_X86_32
    if (error == 0) {
        /* ensure we return via sysenter */
        set_thread_flag(TIF_IRET);
    }
#endif
    putname(filename);
    return error;
}

The function resolves the filename, checks for errors, and then calls do_execve. In the 32‑bit case it sets the TIF_IRET flag so that the return path uses iret, which will pop the saved cs and eip from the stack.

Transition to user mode

After do_execve loads the binary (for /sbin/init this is an ELF executable), it eventually calls start_thread:

void start_thread(struct pt_regs *regs,
                 unsigned long new_ip,
                 unsigned long new_sp)
{
    set_user_gs(regs, 0);
    regs->fs = __USER_DS;
    regs->ds = __USER_DS;
    regs->es = __USER_DS;
    regs->ss = __USER_DS;
    regs->cs = __USER_CS;   /* switch to user code segment */
    regs->ip = new_ip;     /* entry point of the new program */
    regs->sp = new_sp;     /* user stack */
    free_thread_xstate(current);
}

Here the kernel explicitly overwrites regs->cs with __USER_CS (0x33 on x86‑64) and sets the instruction pointer to the program’s entry point ( elf_entry). When the iret instruction at the end of system_call executes, the CPU resumes execution in user mode at that address.

Verification experiment

A small user‑space program reads the cs register and logs it via syslog:

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <syslog.h>

int main(void)
{
    unsigned short ucs;
    asm ("movw %%cs, %0" : "=r" (ucs) :: "memory");
    syslog(LOG_INFO, "ucs = 0x%x
", ucs);
    return 0;
}

The binary is placed in /sys/kernel/uevent_helper and triggered by a hot‑plug event (e.g., inserting a USB drive). The kernel log shows:

Mar 10 14:20:23 build-server main: ucs = 0x33

0x33 matches the __USER_CS value on the tested x86‑64 system, confirming that the transition from kernel to user mode occurs by overwriting cs and ip before the final iret.

Conclusion

The Linux init process starts as a kernel thread, calls run_init_process, which invokes kernel_execve. The int $0x80 system call eventually reaches sys_execve, which prepares the new program and, via start_thread, manually sets regs->cs = __USER_CS and the new instruction pointer. The subsequent iret returns control to user space, completing the privilege‑level switch.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AssemblySystem Callinit processkernel_execveprivilege transition
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.