Deep Dive into NVMe over Fabrics: Design, Protocol Mechanics, and IO Transmission
This article provides an in‑depth exploration of the NVMe over Fabrics protocol, detailing its design goals, data encapsulation scheme, IO transmission process, command extensions such as Connect and Property Get/Set, and the associated Linux reference implementation for RDMA and Fibre Channel transports.
This article examines the internal design and operation of the NVMe over Fabrics (NVMe‑of) protocol, explaining why it was created and what problems it solves, including transparent message and data encapsulation, mapping NVMe interfaces to network transports, and handling node discovery and multipathing.
The protocol defines a complete encapsulation scheme that adapts the traditional NVMe command model for network environments. Like classic NVMe, it uses an asynchronous software‑driven, hardware‑executed model where only descriptors are sent in the command packets; the actual data and SGL descriptors reside in host memory and are fetched by the hardware via DMA.
Because PCIe DMA latency is extremely low (≈1 µs), NVMe‑of is designed to minimize unnecessary network round‑trips. Requests can carry inline data or SGL descriptors, and completions can also include data, reducing the amount of traffic between initiator and target.
The article outlines a typical IO transmission sequence: (1) the initiator driver packages the request and hands it to hardware; (2) the initiator hardware places the request on the target’s submission queue; (3) the target controller processes the request and generates a completion; (4) the target hardware posts the completion to the initiator’s completion queue.
Since requests and completions may embed data, the protocol can avoid extra data‑transfer steps. When no data is embedded, the target can still retrieve the needed data directly from the initiator, as illustrated in the accompanying diagrams.
NVMe‑of extends the standard NVMe command set with five fabric‑specific commands: Connect, Property Get/Set, and Authentication Send/Receive (the latter follows SPC‑4). The Connect command creates a paired submission/completion queue, carrying Host NQN, Subsystem NQN, and Host Identifier, and can target either a static or dynamic controller.
A single host may establish multiple connections to the same subsystem using different NQNs or fabric ports, allowing flexible use of shared or dedicated transport channels. The protocol also defines Property Get/Set to read and write controller registers, compensating for the lack of a PCIe BAR‑style address space in a fabric environment.
Discovery services are provided so initiators can locate NVM subsystems, enumerate namespaces, and enable multipathing. Initiators query the Discovery Log Page to obtain available resources.
The article notes that a reference implementation for Linux, supporting both RDMA and Fibre Channel transports, is available. It includes driver code, CLI tools, and OS integration, offering a solid starting point for the NVMe‑of ecosystem.
Author: Lu Xiangfeng, CTO of Memblaze, originally published on the “Crystalwit” public account. The article is part of a series that fully analyzes NVMe‑of from its origins to technical details.
Recommended reading: "NVMe over Fabric Birth Story – RDMA Edition" (link provided).
Note: Readers are encouraged to scan the QR code or follow the public account for more NVMe technical resources.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
