Fundamentals 12 min read

Unveiling Linux System Call Mechanics: From syscall to sysret

This article provides a comprehensive, step‑by‑step walkthrough of how Linux handles system calls, covering the low‑level assembly entry point, register conventions, the sys_call_table registration process, struct pt_regs usage, and a practical write‑syscall example with a custom extension.

ITPUB
ITPUB
ITPUB
Unveiling Linux System Call Mechanics: From syscall to sysret

The Linux kernel implements system calls using two fundamental assembly instructions, syscall and sysret. When a user program executes syscall, the CPU switches to kernel mode and reads the entry address from the MSR_LSTAR register, jumping to entry_SYSCALL_64.

Dispatching a System Call

Inside entry_SYSCALL_64, the kernel extracts the system‑call number from the rax register, looks it up in the sys_call_table array, and retrieves the corresponding function pointer. The arguments are taken from the registers rdi, rsi, rdx, r10, r8, r9 and passed to the target function. After execution, the return value is placed back into rax, and sysret restores user‑mode state.

Macro‑Based Definition of System Calls

Kernel source defines each system call with macros such as SYSCALL_DEFINE3. For the write call, the macro expands to three functions, only one of which ( __x64_sys_write) is exported and stored in sys_call_table. The expansion is illustrated below:

How Functions Are Registered

The sys_call_table is initially filled with __x64_sys_ni_syscall, which returns -ENOSYS. Real implementations are registered via the __SYSCALL_COMMON macro, generated by the syscalls_64.h header. This header is produced at build time by a makefile that runs syscalltbl.sh on the syscall_64.tbl template.

For example, the line __SYSCALL_COMMON(1, sys_write) assigns the function __x64_sys_write to slot 1 of sys_call_table.

Calling the Registered Function

The kernel function do_syscall_64 receives the syscall number and a pointer to a struct pt_regs instance. It looks up the function in sys_call_table and invokes it, passing the regs structure. The pt_regs fields are named after the hardware registers, so the kernel extracts arguments directly from these fields.

Assembly Details of entry_SYSCALL_64

The entry code pushes register values onto the stack to build a pt_regs object, then moves the syscall number from rax to rdx and the pt_regs pointer to rsi before calling do_syscall_64. After the call, the stack is unwound, restoring registers and placing the final result in rax. Finally, sysret returns to user space.

User‑Space Example

An assembly program invokes the write syscall directly, writing the string "Hi\n" and using ret to return the syscall result as the program’s exit code. The program prints "Hi" and exits with code 3.

The same effect can be achieved with the glibc write wrapper, which ultimately issues the same syscall instruction.

Adding a Custom System Call

To demonstrate extensibility, a new syscall is defined after write. The implementation simply adds 10 to its argument and returns the result. It is registered in syscall_64.tbl with number 442, the kernel is rebuilt, and a user program invoking the new syscall returns 20, confirming the addition.

After recompiling the kernel and running the test program, the output matches the expected value, demonstrating a complete end‑to‑end modification of the Linux system‑call infrastructure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

kernelLinuxOperating SystemSyscall Implementation
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.