From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning

This article traces the evolution of NVIDIA's GPU architectures—from Volta's pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell—highlighting key innovations such as mixed‑precision support, NVLink, and specialized Tensor Core designs that have dramatically boosted AI training and inference performance.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning

Since the Volta era, NVIDIA's GPU architectures have increasingly focused on deep‑learning optimizations. The Volta architecture introduced the first Tensor Core, enabling fused multiply‑add operations and delivering a three‑fold performance boost over Pascal for training and inference.

In 2018, the Turing architecture expanded Tensor Core capabilities to INT8, INT4, and even binary (INT1) formats, supporting mixed‑precision training and adding dedicated RT Cores for ray tracing. These changes yielded up to a 32× performance increase compared to Pascal.

The 2020 Ampere architecture further advanced AI workloads by supporting TF32 and BF16 data types, introducing sparse‑matrix acceleration, and integrating high‑bandwidth NVLink for faster GPU‑to‑GPU communication, thereby improving efficiency and reducing power consumption.

2022's Hopper architecture marked a shift toward AI‑centric design, featuring the fourth‑generation Tensor Core with FP8 support and a Transformer engine, while removing RT Cores to allocate more silicon to deep‑learning calculations.

In 2024, NVIDIA unveiled the Blackwell architecture, delivering a generational leap for generative AI. The GB200 Superchip combines two Blackwell GPUs with a Grace CPU, utilizes 5 nm N4P process, and offers 288 GB HBM3e memory with 1.8 TB/s bandwidth. Blackwell introduces a second‑generation Transformer engine, FP4/FP6 precision support, and fifth‑generation NVLink, achieving up to 30× inference speedup over H100 and 25× energy efficiency gains.

Rubin GPUs, named after astronomer Vera Rubin, target extreme inference workloads with 50 petaflops performance (FP8) and 288 GB HBM4 memory, forming the basis of the Vera Rubin NVL144 cabinet (72 Grace CPUs + 144 Rubin GPUs) that delivers 3.6 exaflops for FP4 inference.

Overall, each architectural generation—Volta, Turing, Ampere, Hopper, and Blackwell—has introduced major innovations that decouple data movement from computation, expand mixed‑precision support, and enhance inter‑GPU connectivity, collectively pushing the boundaries of AI research and applications.

GPU architecture overview
GPU architecture overview
Volta Tensor Core
Volta Tensor Core
Ampere sparse matrix
Ampere sparse matrix
Blackwell GPU diagram
Blackwell GPU diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

deep learningGPU ArchitectureTensor CoreAI hardwareNVLink
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.