Artificial Intelligence 7 min read

Understanding NVIDIA NVLink: Architecture, Features, and Applications

The article introduces NVIDIA’s third‑generation NVLink technology, detailing its high‑bandwidth GPU‑GPU and GPU‑CPU interconnect, key architectural breakthroughs such as the Ampere‑based A100 GPU, multi‑instance GPU, and NVSwitch, and discusses its impact on AI, HPC, and graphics workloads.

Architects' Tech Alliance

Oct 28, 2020

Understanding NVIDIA NVLink: Architecture, Features, and Applications

Several months ago, NVIDIA founder and CEO Jensen Huang unveiled the world’s first GPU based on the NVIDIA® Ampere architecture, the NVIDIA A100, which instantly attracted widespread attention.

NVIDIA Ampere architecture

Third‑generation Tensor Core cores with TF32

Multi‑Instance GPU (MIG)

Third‑generation NVIDIA NVLink

Structured sparsity

One of the key innovations is the adoption of the third‑generation NVIDIA NVLink, which we will focus on in this article.

What is NVLink?

In simple terms, NVIDIA® NVLink® is a high‑bandwidth, low‑latency interconnect mechanism that enables fast direct communication between GPUs and between GPUs and CPUs.

As AI and other compute‑intensive applications increasingly rely on parallel architectures, multi‑GPU and multi‑CPU systems have become common. However, PCIe bandwidth often becomes a bottleneck; NVLink was introduced to address this limitation.

NVIDIA first announced NVLink at the 2014 GTC conference. In 2016, the P100 became the first product to feature NVLink, offering 160 GB/s per GPU—about five times the bandwidth of PCIe Gen3 × 16. The V100, released at GTC 2017 with NVLink 2.0, raised bandwidth to 300 GB/s, roughly ten times PCIe. This year’s A100 integrates the latest third‑generation NVLink, supporting up to twelve NVLink connections per GPU for a total bandwidth of 600 GB/s, nearly ten times the bandwidth of PCIe Gen4.

Application Scenarios

NVLink serves a broad audience: it can provide high‑speed GPU‑GPU interconnects, as well as GPU‑CPU and even CPU‑CPU links, functioning similarly to PCIe or QPI. Any multi‑GPU parallel workload—whether in multi‑hundred‑million‑dollar supercomputer clusters or desktop SLI configurations—benefits from the increased communication bandwidth.

While many associate NVLink with HPC workloads that are sensitive to data‑exchange bandwidth, its value also extends to graphics scenarios. The higher inter‑GPU bandwidth can improve performance in multi‑card SLI and single‑card multi‑core graphics applications.

Using an NVLink bridge, two NVIDIA® Quadro® graphics cards can be connected, expanding memory and performance to meet demanding visual‑compute workloads.

NVSwitch™

NVIDIA NVSwitch™ aggregates multiple NVLink connections within a single node to enable many‑to‑many GPU communication at NVLink speeds, further enhancing interconnect performance. The combination of NVLink and NVSwitch allows NVIDIA to efficiently scale AI performance across multiple GPUs.

Because PCIe bandwidth often becomes a bottleneck at the multi‑GPU system level, the rapid adoption of deep‑learning workloads drives a strong demand for faster, more scalable interconnects. NVSwitch, built on NVLink’s advanced communication capabilities, supports more GPUs per server and provides full‑bandwidth connections between GPUs, with each GPU offering twelve NVLink links to the switch for high‑speed many‑to‑many communication.

Disclaimer

This article is reproduced with permission; please credit the original author and source. If there are any copyright issues, please contact us for resolution.

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.