Why NVMe‑oF Is Redefining High‑Performance Storage Networks
This article explains how the shift from HDD to ultra‑fast SSDs and NVMe changes storage networking, compares NVMe with legacy SCSI, details NVMe‑oF transport options (FC, TCP, RDMA), examines RDMA variants, and outlines the network requirements and trade‑offs for deploying NVMe‑oF in modern data centers.
What is NVMe?
NVMe is a storage protocol designed specifically for solid‑state drives. It exposes SSDs as memory over the PCIe bus, supports up to 65,535 I/O queues with 65,535 commands per queue, and provides NUMA‑aware, low‑latency access without the adapter overhead of legacy SCSI.
NVMe over Fabrics (NVMe‑oF)
NVMe‑oF extends the NVMe protocol across a network, delivering latency comparable to direct‑attached storage. Three official transport bindings are defined:
NVMe/FC – Fibre Channel (or Fibre Channel over Ethernet) transport.
NVMe/TCP – NVMe commands encapsulated in standard TCP/IP.
NVMe/RDMA – Uses RDMA (InfiniBand, RoCE, or iWARP) for data movement.
RDMA‑based NVMe‑oF
RDMA enables zero‑copy, kernel‑bypass transfers between hosts, eliminating CPU involvement in the data path.
InfiniBand – Native RDMA support.
RoCE – RDMA over Converged Ethernet. RoCEv1 operates at Layer 2 only; RoCEv2 adds UDP/IP routing (default UDP port 4791).
iWARP – RDMA over TCP, providing congestion‑aware flow control and tolerance to packet loss.
Network requirements for NVMe‑oF
NVMe/IB – Requires an InfiniBand fabric.
NVMe/FC – Requires Generation 5 or 6 Fibre Channel (typical speeds 4‑32 Gbps).
NVMe/TCP – Runs over standard Ethernet; incurs CPU overhead for TCP checksum and may add latency.
NVMe/RDMA (RoCE, iWARP) – RoCE prefers lossless Ethernet; iWARP can operate over loss‑tolerant Ethernet because it uses TCP.
Key design considerations:
Dedicated vs. shared fabrics – dedicated fabrics (InfiniBand, Fibre Channel) give predictable latency at higher cost.
End‑to‑end latency budget – NVMe targets sub‑microsecond latency; storage should be placed close to the host.
Loss tolerance – TCP‑based transports (NVMe/TCP, iWARP) handle packet loss; RoCE relies on lossless Ethernet or its NACK‑based recovery.
Additional technical notes
NVMe replaces the SCSI block‑device model with a memory‑centric model, allowing many‑to‑many host‑target relationships and multi‑queue operation (up to 64 K queues, each with up to 64 K commands). This reduces I/O overhead and latency compared with legacy SCSI over SATA or SAS.
When selecting a transport, consider CPU overhead (NVMe/TCP), required Ethernet characteristics (lossless for RoCE), and distance constraints – NVMe’s strict latency limits generally restrict storage to the same data‑center or rack.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
