Function Call Mechanics and Assembly Implementations of memcpy and memset in glibc
This article explains CPU registers, stack layout, function call conventions, and provides detailed explanations of how glibc implements memcpy and memset both in C and in low‑level assembly, including code snippets and the handling of remaining bytes.
The article introduces techniques for Go optimization and then focuses on low‑level concepts such as CPU registers, the function call process, and assembly language, aiming to improve program performance with minimal code changes.
It describes the stack as a region of a process's virtual address space, each thread having its own stack composed of a base pointer (%ebp) and a stack pointer (%esp) that grows from high to low addresses, forming stack frames for each function call.
When a function is called, the typical steps are: push arguments onto the stack, push the return address, jump to the callee (creating a new stack frame), execute the function body, then return by cleaning the stack and restoring the previous frame.
The article outlines three common calling conventions: stdcall (parameters right‑to‑left, callee cleans stack), cdecl (parameters right‑to‑left, caller cleans stack), and fastcall (first up to four small arguments in registers EAX/ECX/EDX, remaining arguments right‑to‑left, callee cleans stack).
It then presents the glibc memcpy prototype, which copies n bytes from src to dest without bounds checking.
The assembly implementation of memcpy (found in arch/x86/boot/copy.S ) uses %esi as the source register and %edi as the destination register (low 16‑bit parts %si and %di). The function follows a fastcall convention: dest in %ax, src in %dx, n in %cx. It copies four bytes at a time with rep movsl after shifting n right by two ( shrw $2, %cx ), then copies any remaining bytes with rep movsb .
Next, the glibc memset prototype is introduced; it fills n bytes at address s with the low‑order byte of the integer c .
The assembly version of memset also resides in arch/x86/boot/copy.S . Parameters are placed as: s in %ax, n in %cx, and c in %dl (only the lowest byte is used). The byte value is expanded to a 32‑bit pattern with imull $0x01010101,%eax , then four‑byte blocks are written using rep stosl . Any leftover bytes are handled by masking %cx with andw $3, %cx and copying them with another rep stosl loop.
The article concludes by inviting readers to point out any omissions and encourages further discussion.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.