Essential Q&A on NVIDIA Quantum‑2 InfiniBand: Compatibility, Cabling, and Performance
This article compiles detailed technical Q&A about NVIDIA's Quantum‑2 InfiniBand platform, covering compatibility of CX7 NDR ports, cabling options, switch connections, UFM deployment, PCIe bandwidth limits, and performance considerations for high‑performance computing clusters.
Technical Q&A for NVIDIA Quantum‑2 InfiniBand
Q: Is the CX7 NDR200 QSFP112 port compatible with HDR/EDR cables? A: Yes.
Q: How to connect a CX7 NDR network card to a Quantum‑2 QM97XX series switch? Use NVIDIA 400GBASE‑SR4 or 400GBASE‑DR4 optical modules on the CX7 NDR card, and 800GBASE‑SR8 (equivalent to 2×400GBASE‑SR4) or 800GBASE‑DR8 (equivalent to 2×400GBASE‑DR4) modules on the QM97XX switch, linked with 12‑core multimode APC fiber.
Q: Can a dual‑port 400G CX7 card achieve 800G by bonding? Why can 200G be bonded to 400G? Overall performance is limited by PCIe bandwidth, NIC processing capability, and physical port bandwidth. The CX7 follows PCIe 5.0 ×16, capping at 512 Gbps, so a dual‑port 400G configuration cannot reach 800 Gbps.
Q: How to connect branch cables? An 800G‑to‑2×400G branch cable should connect to two separate servers; it should not be attached to a single Ethernet NIC because GPU servers typically have multiple NICs.
Q: In an InfiniBand NDR scenario, how is a one‑to‑two cable connected? Two options exist: (1) split‑fiber modules that divide 400G into 2×200G (e.g., MMS4X00‑NS400 + MFP7E20‑NXXX + MMS4X00‑NS400) and (2) high‑speed branch cables that split 800G into 2×400G (e.g., MCP7Y00‑NXXX or MCP7Y10‑NXXX).
Q: In a Superpod network, should four NDR200 cards on a server use a single 1×4 cable to one switch or two 1×2 cables to different switches? Using a single 1×4 cable to one switch is not recommended; Superpod architecture requires two 1×2 cables connecting to different leaf switches for optimal NCCL/SHARP performance.
Q: If the Superpod network requires two dedicated IB switches with UFM software, can I omit the separate UFM switch and run UFM only on the management node? Deploying UFM on the management node is possible but it should not handle GPU compute workloads; the storage network operates independently and cannot replace the dedicated compute‑network UFM.
Q: What is the difference between enterprise UFM, SDN, telemetry, and Cyber‑Al? Is purchasing UFM necessary? Simple management and monitoring can be done with OpenSM from OFED, but UFM provides a richer GUI and additional features not available in OpenSM.
Q: How many subnet managers are needed for switches, OFED, and UFM? Switches manage up to 2 K nodes; OpenSM (OFED) and UFM have no strict node limit but depend on the management node’s CPU and hardware capacity.
Q: Why does a 64‑port 400G switch have only 32 OSFP ports? A 2U chassis limits the number of slots to 32; each slot is designed for two 400G OSFP interfaces.
Q: Can a cable connect an OSFP port on a server to a QSFP112 port on a switch? Yes, provided both ends use the same 400G‑DR4 or 400G‑FR4 optical standard; OSFP and QSFP112 modules are electrically compatible.
Q: Can UFM monitor RoCE networks? No, UFM only supports InfiniBand networks.
Q: Do managed and unmanaged switches offer the same UFM functionality? Yes, the functionality is identical.
Q: What is the maximum transmission distance for InfiniBand cables? Optical modules with jumpers reach about 500 m; passive high‑speed cables are limited to ~3 m, while active ACC cables can reach ~5 m.
Q: Can a CX7 NIC connect to a 400G Ethernet switch that supports RDMA? A 400G Ethernet connection is possible and RoCE can operate, but performance is not guaranteed; NVIDIA Spectrum‑X (BF3 + Spectrum‑4) is recommended for optimal results.
Q: Are HDR and EDR cables compatible with NDR? Yes; typically OSFP‑to‑2×QSFP56 DAC/AOC cables are used to ensure compatibility.
Q: Should the OSFP module on a NIC be a flat module? NICs have heatsinks and can use thick modules; flat modules are mainly for liquid‑cooled switch ports.
Q: Does an InfiniBand NIC support RDMA over Ethernet? RoCE can be enabled, and NVIDIA Spectrum‑X is the suggested solution.
Q: Why are there no dedicated NDR cables? OSFP modules are large and heavy, making fibers more prone to damage; multi‑branch cables increase the number of large optical ports, raising breakage risk, especially for 30 m AOCs.
Q: Are the cables for 400G InfiniBand and 400G Ethernet the same? Yes, but they must be APC‑type with an 8° angle.
Q: What latency requirements does a CX7 NIC have in an optimized test environment? Latency depends on the test machine’s frequency, configuration, and tools such as perftest or MPI; acceptable values vary per setup.
Q: Are OSFP modules on the NIC flat? The term refers to modules with integrated heatsinks; flat modules are not required.
Q: What role does UFM play in a cluster solution? UFM runs as an independent node on a server, can be deployed in a two‑server high‑availability configuration, but should not handle compute workloads.
Q: What cluster size is recommended for deploying UFM? It is advisable to configure UFM for all InfiniBand networks, as it provides OpenSM and additional management capabilities.
Q: Does PCIe 5 support more than 512 Gbps? PCIe Gen5 offers up to 32 GT/s × 16, yielding a maximum of 512 Gbps; PCIe Gen4 provides up to 256 Gbps.
Q: Are InfiniBand network cards single‑duplex or full‑duplex? All InfiniBand NICs operate in full‑duplex mode; the half‑duplex concept does not apply because transmit and receive paths are separate.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
