Understanding Linux System Calls: Mechanism, Debugging, and Implementation
This article explains how Linux system calls serve as the primary interface between user‑space programs and the kernel, describes the transition from user mode to kernel mode, and provides step‑by‑step debugging examples and source‑code snippets for x86_64 architectures.
System calls are the main mechanism for interaction between user‑space programs and the Linux kernel. Because of their importance, the kernel provides a generic, efficient, and consistent implementation across architectures.
Linux restricts process permissions to prevent illegal device access; the kernel offers a set of interfaces that user programs invoke as system calls. This article demonstrates the workflow of a system call by tracing from a user program into the kernel.
1. System Calls
Unlike regular function calls, system calls execute code in the kernel and require special instructions (e.g., the syscall instruction on x86) to switch from user mode to privileged mode (ring 0). The kernel identifies the requested service by a system‑call number rather than a function address.
When a user program issues a system call, the processor saves the current context (registers, program counter, etc.) and jumps to a predefined entry point identified by the system‑call number.
Inside the kernel, the system‑call table maps numbers to kernel functions. After the kernel function finishes, the processor restores the saved context and returns to user space.
Using numbers instead of addresses allows a standardized, cross‑platform system‑call interface; the same number works on different Linux variants without exposing kernel implementation details.
Thus, the system‑call mechanism involves a user‑to‑kernel mode switch, number identification, and kernel‑side handling logic.
2. User Space
Starting with a simple "Hello World" program written in assembly, the article shows how to invoke the write system call (number 1 on 64‑bit) and the exit system call (number 60).
.section .data
msg:
.ascii "Hello World!\n"
len = . - msg
.section .text
.globl main
main:
mov $1, %rdi # fd
mov $msg, %rsi # buffer
mov $len, %rdx # count
mov $1, %rax # write syscall number
syscall
mov $0, %rdi # status
mov $60, %rax # exit syscall number
syscallCompile and run:
$ gcc -o helloworld helloworld.s
$ ./helloworld
Hello World!
$ echo $?
03. Kernel Space
System calls are triggered by the syscall instruction, which transfers control to the kernel.
3.1 Kernel Debugging
Set a breakpoint at the write function and inspect the call stack.
#include
#include
#include
static ssize_t my_write(struct file *file, const char __user *buf,
size_t len, loff_t *offset)
{
/* set breakpoint here */
dump_stack();
return len;
}
static struct file_operations fops = {
.write = my_write,
};
static int __init my_init(void)
{
return 0;
}
static void __exit my_exit(void)
{
}
module_init(my_init);
module_exit(my_exit);
MODULE_LICENSE("GPL");When the breakpoint is hit, the call stack reveals the entry point entry_SYSCALL_64 .
#include
#include
#include
int main() {
pid_t pid = getpid();
syscall(39, pid, NULL, NULL); // getpid syscall
return 0;
}Compile with gcc -o test test.c and debug with GDB to trace the transition from user to kernel space.
3.2 System‑Call Entry
The 64‑bit entry point entry_SYSCALL_64 saves the user context, validates the system‑call number, looks up the handler in sys_call_table , and jumps to it.
ENTRY(entry_SYSCALL_64)
subq $FRAME_SIZE, %rsp
MOV_LDX(regs, %rsp)
cmpl $(nr_syscalls),%eax
jae badsys
leaq sys_call_table(%rip),%r10
movq (%r10,%rax,8), %r10
...The code allocates space for pt_regs , saves the user stack pointer, checks the call number, and loads the appropriate handler.
3.3 do_syscall_64
do_syscall_64 extracts the call number and arguments, performs permission checks, invokes the kernel function, and returns the result.
#ifdef CONFIG_X86_64
__visible void do_syscall_64(unsigned long nr, struct pt_regs *regs) {
nr &= __SYSCALL_MASK;
if (likely(nr < NR_syscalls)) {
nr = array_index_nospec(nr, NR_syscalls);
regs->ax = sys_call_table[nr](regs);
}
syscall_return_slowpath(regs);
}
#endif3.4 System‑Call Table
The table syscall_64.tbl maps numbers to function names; a script generates syscalls_64.h , which is included to initialise sys_call_table . #include #include #include static const char *syscall_names[] = { [0] = "sys_read", [1] = "sys_write", // ... }; int main() { for (int i = 0; i < sizeof(syscall_names)/sizeof(syscall_names[0]); ++i) printf("Syscall number %d: %s\n", i, syscall_names[i]); return 0; } Each entry in sys_call_table points to a wrapper generated by the SYSCALL_DEFINE* macros (e.g., SYSCALL_DEFINE3(write, ...) expands to the actual kernel implementation). 4. Defining a System Call Example: the read() system call is defined with SYSCALL_DEFINE3(read, unsigned int fd, char __user *buf, size_t count) . The macro expands to a series of wrappers ( sys_read , SyS_read , etc.) and registers the function in the table. SYSCALL_DEFINE3(read, unsigned int fd, char __user *buf, size_t count) { struct fd f = fdget(fd); ssize_t ret = -EBADF; if (f.file) { loff_t pos = file_pos_read(f.file); ret = vfs_read(f.file, buf, count, &pos); file_pos_write(f.file, pos); fdput(f); } return ret; } The macro chain creates metadata, the actual syscall wrapper, and an alias so that sys_read and SyS_read resolve to the same implementation. 5. x86_64 Syscall Invocation On x86_64, the user program places the syscall number in rax , arguments in rdi, rsi, rdx, r10, r8, r9 , and executes syscall . The CPU loads the address from the MSR MSR_LSTAR (written during boot) and jumps to system_call , which saves registers, looks up the handler in sys_call_table , and invokes it. ENTRY(system_call) movq %rsp, PER_CPU_VAR(old_rsp) movq PER_CPU_VAR(kernel_stack), %rsp ENABLE_INTERRUPTS(CLBR_NONE) SAVE_ARGS 8,0 movq %r10, %rcx call *sys_call_table(,%rax,8) movq %rax, RAX-ARGOFFSET(%rsp) The SAVE_ARGS macro pushes the argument registers onto the kernel stack, matching the asmlinkage calling convention. During early boot, syscall_init() writes the address of system_call to MSR_LSTAR , enabling the syscall instruction to transfer control to the kernel. Overall, the article walks through the complete lifecycle of a Linux system call—from user‑space invocation, through kernel entry, table lookup, actual handler execution, and return—while providing concrete debugging commands and source‑code examples for the x86_64 platform.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.