Understanding InfiniBand: Architecture, Protocols, and Performance
InfiniBand is a high‑performance network protocol that uses credit‑based flow control and switched fabric architecture to provide low latency, high bandwidth, and reliable data transfer, offering advantages over TCP/IP such as reduced packet loss, efficient RDMA, and support for various upper‑layer protocols.
Traditional IP networks such as TCP/IP suffer from packet loss and retransmission delays, which degrade performance, whereas InfiniBand (IB) employs a trust‑based, credit‑based flow‑control mechanism that virtually eliminates packet loss.
IB only transmits data when the receiver’s buffer has sufficient space; after receiving data, the receiver signals buffer availability, removing the need for retransmissions and improving overall efficiency.
What is InfiniBand Network
InfiniBand is a network communication protocol built on a switched architecture, forming point‑to‑point bidirectional serial links between processor nodes and I/O nodes (e.g., disks or storage). Each link terminates in a device that controls transmission and reception at both ends.
InfiniBand creates private, protected channels between nodes via switches, enabling remote direct memory access (RDMA) without CPU involvement; the InfiniBand adapter manages load transmission.
The adapter connects to the CPU via PCI Express on one side and to the InfiniBand subnet on the other, offering higher bandwidth, lower latency, and better scalability compared to traditional protocols.
What is InfiniBand Architecture
The InfiniBand Architecture (IBA) is hardware‑oriented, whereas TCP is software‑oriented; consequently, IB is a lighter transport service that does not require packet reordering because the link layer already delivers ordered packets.
Because IB uses credit‑based flow control, the transport layer does not need TCP‑like window algorithms to determine the optimal number of in‑flight packets, allowing delivery of 56 Gb/s or 100 Gb/s data rates with minimal latency and negligible CPU usage.
IB is a channel‑based, bidirectional serial transmission that employs a switched‑fabric topology; switches (IBA switches) and repeaters extend the fabric when needed.
Each IB subnet can contain up to 65,536 nodes. To span multiple subnets, routers or gateways are required.
Nodes connect to the subnet via adapters: Host Channel Adapter (HCA) for CPUs/memory, Target Channel Adapter (TCA) for storage/I/O, and the overall connection is called a link.
InfiniBand Rate Development Introduction
InfiniBand serial links operate at various signaling rates; multiple links can be aggregated (e.g., 4X) to achieve higher throughput. Signaling rate and encoding scheme determine the effective data rate, with encoding adding overhead (e.g., 10 bits transmitted for every 8 bits of data).
Typical implementations bundle four link lanes; current InfiniBand systems provide the following throughput rates:
InfiniBand Upper‑Layer Protocols
InfiniBand supports several upper‑layer protocols and defines messages for management functions, including SDP, SRP, iSER, RDS, IPoIB, and uDAPL.
SDP (Sockets Direct Protocol) : Defined by the InfiniBand Trade Association, it allows existing TCP/IP applications to run over high‑speed InfiniBand.
SRP (SCSI RDMA Protocol) : Packages SCSI commands for RDMA transmission over InfiniBand, enabling storage sharing and RDMA communication.
iSER (iSCSI Extensions for RDMA) : Transports iSCSI commands and data over RDMA, standardised by IETF for IB SAN environments.
RDS (Reliable Datagram Sockets) : Similar to UDP, designed for socket‑based data transfer over InfiniBand, developed by Oracle.
IPoIB (IP over InfiniBand) : Provides compatibility between InfiniBand and TCP/IP networks, allowing unmodified TCP/IP applications to use InfiniBand bandwidth.
uDAPL (User Direct Access Programming Library) : A standard API that leverages RDMA to improve data‑center application performance, scalability, and reliability.
Additional protocols such as iSER, NFSoRDMA, and SRP also enable SCSI command packaging and storage sharing via RDMA.
InfiniBand Management Software
OpenSM is an InfiniBand subnet manager that runs on the Mellanox OFED stack, handling in‑band management of the fabric, including device discovery, monitoring, and health analysis.
OpenSM comprises a subnet manager, backbone manager, and performance manager, offering comprehensive management capabilities such as automatic device discovery, fabric visualization, intelligent analysis, and health monitoring.
For more detailed InfiniBand technical information, refer to the e‑book “InfiniBand Architecture and Technical Practice Summary”.
Recommended Reading:
InfiniBand Architecture and Technical Practice Summary
Detailed Guide to Learning Big Data from Scratch
Warm Tip:
Please search for “ICT_Architect” or scan the QR code to follow the public account and click the original link for more technical articles.
Seek knowledge with hunger, remain humble with curiosity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
