PCIe Device Enumeration and Memory Access in x86 Systems
This article explains how PCIe devices are discovered and accessed in an x86 system, detailing the hierarchical bus topology, depth‑first enumeration steps, configuration space handling, Linux lspci inspection, and NVMe command transmission through PCIe memory transactions.
Hard drives have evolved from HDD to SSD, SATA to NVMe, and the PCIe interface now serves as the front‑end for NVMe SSDs; this article introduces how PCIe devices are discovered and accessed within an x86 system.
PCIe topology supports up to 256 buses, each with up to 32 devices, each device with up to 8 functions, forming a BDF (Bus, Device, Function) identifier for every node.
The architecture consists of root complexes, switches, and endpoints; the host scans the hierarchy using a depth‑first algorithm called enumeration, issuing configuration read/write transactions to probe downstream devices.
Step 1: The host bridge scans Bus 0, ignores embedded endpoints, discovers Bridge 1, assigns Bus 1 downstream, sets Primary Bus Number = 0 and Secondary Bus Number = 1, and temporarily marks Subordinate Bus Number as 0xFF.
Step 2: Scanning Bus 1 reveals Bridge 3 (a switch); the host creates Bus 2 downstream and sets Primary = 1, Secondary = 2, Subordinate = 0xFF.
Step 3: Scanning Bus 2 discovers Bridge 4 and an NVMe SSD endpoint; Bus 3 is assigned downstream, Primary = 2, Secondary = 3, and Subordinate Bus Number is set to 3 because Bus 3 ends in a leaf node.
Step 4: After Bus 3, the host returns to Bus 2, discovers Bridge 5 and a NIC endpoint; Bus 4 is created, Primary = 2, Secondary = 4, Subordinate = 4.
Step 5: No further devices under Bridge 4 and Bridge 5, so the host sets Bridge 3’s Subordinate Bus Number to 4 and then returns to Bridge 1, setting its Subordinate Bus Number also to 4.
Step 6: Returning to Bus 0, the host discovers Bridge 2 and a graphics card endpoint; Bus 5 is assigned, Primary = 0, Secondary = 5, Subordinate = 5, completing enumeration of all PCIe devices.
In Linux, the command lspci -v -t displays the enumerated hierarchy as a tree; the STAR1000 NVMe SSD from Beijing Starblaze appears with BDF 3C:00.0 and upstream port 00:1d.0.
Detailed configuration space can be queried with lspci -xxx -s 3C:00.0 , revealing Vendor ID, Device ID, class code 0x010802 (NVMe storage), and capability pointers (e.g., power management, MSI, link control). The link capability at offset 0x43 shows a x4 lane Gen3 (8 Gbps) connection.
Each PCIe device is allocated CPU memory windows; the STAR1000 provides a 1 MiB and a 256 KiB region, whose base addresses are programmed by the host, enabling memory‑mapped access to NVMe control and status structures.
NVMe command submission uses a doorbell write (PCIe Memory Write) from the host to the SSD, followed by the SSD issuing a Memory Read request for completion, illustrating the bidirectional memory transaction flow.
The reverse read (SSD to host) generates a Completion with Data (CplD) packet, completing the command cycle; other NVMe operations (queue management, data transfer) follow the same memory‑access pattern.
The article concludes that understanding PCIe enumeration and memory access lays the groundwork for deeper topics such as protocol layers, link training, and power management, noting that PCIe Gen4 (16 Gbps per lane) is already in production and Gen5 (32 Gbps) is under development.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.