How NVMe over Fabrics Transforms Storage Networking: Design and Data Flow
This article delves into the internal design of NVMe over Fabrics, explaining its encapsulation scheme, command mapping, discovery mechanisms, and the extended Connect and Property commands that enable efficient, low‑latency storage networking across diverse fabric transports.
This article explores the internal design of NVMe over Fabrics (NVMe‑oF), focusing on the protocol’s goals: providing a transparent encapsulation format for messages and data, mapping NVMe operations onto network interfaces, and addressing new challenges such as node discovery and multipathing.
The protocol defines a complete encapsulation solution that differs from traditional NVMe. NVMe uses an asynchronous model where commands carry only descriptors while the actual data and SGL descriptors reside in host memory and are fetched by hardware via DMA. Because fabric networks introduce much higher latency than PCIe, NVMe‑oF allows request packets to embed data or SGL descriptors directly, and completion packets can also carry return data, reducing unnecessary round‑trips.
Unlike classic NVMe, the completion queue in NVMe‑oF does not employ flow control; the receiver must allocate enough space to hold all outstanding completions. An I/O transmission proceeds through the following steps:
Initiator driver packages the request and hands it to the hardware.
The Initiator hardware posts the request to the Target’s submission queue.
The Target controller processes the I/O and prepares a completion entry.
The Target hardware posts the completion to the Initiator’s receive queue.
If a request does not carry data, the Target can still retrieve the needed data directly from the Initiator, as illustrated in the accompanying diagram.
NVMe‑oF extends the standard NVMe command set with five fabric‑specific commands: Connect , Property Get/Set , and Authentication Send/Receive . The article concentrates on Connect and Property commands.
The Connect command creates a send/receive queue pair and includes the Host NQN, NVM Subsystem NQN, and Host Identifier. It can target either a static controller or a dynamic controller. A host may establish multiple connections to the same subsystem using different Host NQNs or fabric ports, allowing either shared or exclusive use of underlying fabric channels.
The Property Get/Set commands replace the BAR0/BAR1 register access used in PCIe, providing read and write operations for controller registers that have no direct equivalent in a fabric environment.
To support discovery and multipathing, NVMe‑oF defines a discovery service. Initiators can query a specially configured NVMe Subsystem to obtain a Discovery Log Page, which lists accessible subsystems, namespaces, and paths.
A reference implementation for Linux is provided, offering RDMA and Fibre Channel transport layers for both initiator and target roles. The code includes driver logic, command‑line utilities, and integration with the Linux OS, serving as a solid foundation for the NVMe‑oF ecosystem.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
