Artificial Intelligence 10 min read

From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning

This article traces the evolution of NVIDIA's GPU architectures—from Volta's pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell—highlighting key innovations such as mixed‑precision support, NVLink, and specialized Tensor Core designs that have dramatically boosted AI training and inference performance.

Architects' Tech Alliance

Aug 10, 2025

From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning

Since the Volta era, NVIDIA's GPU architectures have increasingly focused on deep‑learning optimizations. The Volta architecture introduced the first Tensor Core, enabling fused multiply‑add operations and delivering a three‑fold performance boost over Pascal for training and inference.

In 2018, the Turing architecture expanded Tensor Core capabilities to INT8, INT4, and even binary (INT1) formats, supporting mixed‑precision training and adding dedicated RT Cores for ray tracing. These changes yielded up to a 32× performance increase compared to Pascal.

The 2020 Ampere architecture further advanced AI workloads by supporting TF32 and BF16 data types, introducing sparse‑matrix acceleration, and integrating high‑bandwidth NVLink for faster GPU‑to‑GPU communication, thereby improving efficiency and reducing power consumption.

2022's Hopper architecture marked a shift toward AI‑centric design, featuring the fourth‑generation Tensor Core with FP8 support and a Transformer engine, while removing RT Cores to allocate more silicon to deep‑learning calculations.

In 2024, NVIDIA unveiled the Blackwell architecture, delivering a generational leap for generative AI. The GB200 Superchip combines two Blackwell GPUs with a Grace CPU, utilizes 5 nm N4P process, and offers 288 GB HBM3e memory with 1.8 TB/s bandwidth. Blackwell introduces a second‑generation Transformer engine, FP4/FP6 precision support, and fifth‑generation NVLink, achieving up to 30× inference speedup over H100 and 25× energy efficiency gains.

Rubin GPUs, named after astronomer Vera Rubin, target extreme inference workloads with 50 petaflops performance (FP8) and 288 GB HBM4 memory, forming the basis of the Vera Rubin NVL144 cabinet (72 Grace CPUs + 144 Rubin GPUs) that delivers 3.6 exaflops for FP4 inference.

Overall, each architectural generation—Volta, Turing, Ampere, Hopper, and Blackwell—has introduced major innovations that decouple data movement from computation, expand mixed‑precision support, and enhance inter‑GPU connectivity, collectively pushing the boundaries of AI research and applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning GPU Architecture Tensor Core AI hardware NVLink

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.