Fundamentals 30 min read

Comprehensive Guide to x86 Assembly Language and GNU Syntax

This guide provides a thorough introduction to x86 assembly language, covering GNU syntax, CPU architecture, registers, instruction formats, data types, memory models, and practical examples with NASM and GNU as, enabling readers to write efficient low‑level code and deepen their understanding of computer systems.

Deepin Linux
Deepin Linux
Deepin Linux
Comprehensive Guide to x86 Assembly Language and GNU Syntax

In the vast ocean of computer science, assembly language serves as an essential lighthouse, guiding us to a deeper understanding of the computer's nature and operation. x86 assembly language, a core of modern computer architecture, offers complexity and power that every programmer should master, while the GNU format provides a flexible and widely used method for efficient low‑level hardware interaction.

This guide walks you step‑by‑step into the world of x86 assembly, exploring the basic syntax of the GNU format, common instructions, and their underlying principles. Whether you are a beginner or an experienced developer, clear examples and practical tips are provided to help you understand how assembly works with high‑level languages to improve program performance.

1. Introduction to x86 Assembly Language

x86 assembly language primarily includes bus and register structures, data types, basic operation instructions, and function calling conventions.

The bus in x86 consists of address, data, and control buses, determining the CPU's addressing capability, data transfer volume, and control over other system components.

Regarding registers, x86 provides a rich set of general‑purpose registers: EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI. EAX is typically used for arithmetic and holds function return values; ECX serves as a loop counter; ESP points to the stack top; EBP points to the base of a stack frame; EBX is a base address register; EDX holds the remainder of integer division; ESI/EDI are source/target index registers used in string operations. Each of these registers can be accessed in 8‑, 16‑, 32‑, or 64‑bit portions.

Segment registers (CS, SS, DS, ES, FS, GS) locate memory segments. The status flag register contains bits that the CPU sets or clears, such as ZF, CF, SF, TF, etc.

The instruction pointer (EIP) holds the address of the next instruction to execute.

1.1 Memory

A program's memory is divided into four sections: stack, heap, code, and data, used for local variables, dynamic allocation, executable instructions, and global/static values respectively.

Stack: Used for local variables, parameters, and control flow. The ESP register points to the top of the stack; PUSH decreases ESP, POP increases it. EBP remains constant within a function to locate locals and parameters.

Heap: Provides dynamic memory for allocating and freeing values during program execution.

Code: Contains the CPU instructions that are executed.

Data: Holds static values that may be accessed globally.

1.2 Buses

Address bus: Width determines the CPU's addressing capability; e.g., a 20‑bit address bus on the 8086 allows 1 MiB of addressable memory.

Data bus: Width determines how many bits can be transferred per operation; a wider bus increases data throughput.

Control bus: Width determines how many distinct control signals the CPU can issue to other devices.

2. Detailed Register Overview

2.1 General‑Purpose Registers

The x86‑64 CPU contains 16 registers that store 64‑bit values, used for integers and pointers. The original 8 registers (AX‑BP) were extended to 32‑bit (EAX‑EBP) and then to 64‑bit (RAX‑RBP), with eight additional registers R8‑R15.

Commonly used registers include:

EAX: Default for arithmetic, also holds function return values; can be accessed as AX, AH, AL.

EBX: Base address register for memory addressing.

ECX: Counter for loops and string operations.

EDX: Holds remainder of integer division.

ESP: Stack pointer; adjusts on PUSH/POP.

EBP: Base pointer for stack frames.

ESI/EDI: Source/target index registers for string instructions.

2.2 Flag Register (EFLAGS/RFLAGS)

EFLAGS contains status, control, and system flags. In 64‑bit mode it is extended to RFLAGS, where the upper 32 bits are reserved.

Key status flags include ZF (zero), CF (carry), SF (sign), DF (direction), TF (trap), IF (interrupt), IOPL (I/O privilege level), NT (nested task), RF (resume), VM (virtual‑8086), AC (alignment check), VIF/VIP (virtual interrupt), and ID (identification).

These flags affect arithmetic results, branching decisions, and debugging behavior.

2.3 Segment Registers

x86‑64 has six 16‑bit segment registers (CS, DS, SS, ES, FS, GS) that hold segment selectors for code, data, and stack segments.

2.4 Control Registers

Intel CPUs provide six control registers (CR0‑CR4, CR8) that control processor modes, enable extensions, and record exception information.

2.5 Instruction Pointer Registers

RIP/EIP hold the offset of the next instruction to execute within the current code segment.

2.6 Model‑Specific Registers (MSR)

MSRs offer performance monitoring, tracing, and other CPU‑specific features. Access is performed via RDMSR/WRMSR using ECX as the index and EDX:EAX to hold the 64‑bit value.

3. Data Representation

In x86/x64, data is categorized as fundamental (byte, word, doubleword, quadword) and numeric (integer, floating‑point, BCD, SIMD).

Fundamental types define the width of data an instruction can process in one step.

Numeric types include signed/unsigned integers, IEEE‑754 floating‑point numbers, packed BCD, and SIMD vectors for parallel processing.

4. Basic Instruction Format

Instructions consist of optional prefixes, ModR/M, SIB, displacement, and immediate fields. The general syntax is [label:] mnemonic [operands] ; comment . Operands can be registers, immediates, or memory references.

x86 uses little‑endian byte order, opposite to network big‑endian order.

5. Example Cases

5.1 Defining Data

Common data directives include:

DB – define byte

DW – define word (16‑bit)

DD – define doubleword (32‑bit)

DQ – define quadword (64‑bit)

DT – define ten‑byte (80‑bit) value

EQU – define a constant

TEXTEQU – define a text macro

5.2 Assemblers

NASM: Intel syntax, cross‑platform (Linux, Windows). Supports multiple output formats and a powerful macro processor.

GNU as: Default AT&T syntax, but can switch to Intel syntax with .intel_syntax directives. Supports various target architectures via command‑line options.

5.3 x86 Assembly Hello World Example

Using NASM:

section .data
    hello db "Hello world!",10
    hello_len equ $-hello

section .text
    global _start

_start:
    mov eax,4
    mov ebx,1
    mov ecx,hello
    mov edx,hello_len
    int 0x80
    mov eax,1
    xor ebx,ebx
    int 0x80

Compile and link:

nasm -f elf64 hello_nasm.asm -o hello_nasm.o
ld hello_nasm.o -o hello_nasm

Using GNU as (AT&T syntax):

.section .data
    hello_str: .string "Hello world!"

.section .text
    .globl _start
_start:
    mov edi, hello_str
    call puts
    mov eax, 60
    xor edi, edi
    syscall

Save as hello_gnu.asm . To use Intel syntax, add .intel_syntax noprefix at the file start.

assemblyLow-level programmingregistersx86GNUinstructions
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.