Tagged articles

Tensor Core

14 articles · Page 1 of 1

Oct 30, 2025 · Artificial Intelligence

How to Pick the Perfect Nvidia GPU for AI Servers – From Tesla to Hopper

This article traces the evolution of Nvidia’s GPU architectures—from the early Tesla series through Fermi, Kepler, Maxwell, Pascal, Volta, Turing, Ampere, and the latest Hopper—detailing their specifications, key features, and offering a systematic decision‑making guide for AI server designers to select the optimal GPU based on workload, model size, precision, scalability, and total cost of ownership.

AI serverGPU SelectionGPU architecture

0 likes · 16 min read

How to Pick the Perfect Nvidia GPU for AI Servers – From Tesla to Hopper

Architects' Tech Alliance

Aug 10, 2025 · Artificial Intelligence

From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning

This article traces the evolution of NVIDIA's GPU architectures—from Volta's pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell—highlighting key innovations such as mixed‑precision support, NVLink, and specialized Tensor Core designs that have dramatically boosted AI training and inference performance.

AI hardwareGPU architectureNVLink

0 likes · 10 min read

From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning

AntTech

May 20, 2025 · Information Security

FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

The article reviews two recent ISCA 2025 papers—FAST and Neo—that introduce hardware and GPU‑based accelerators employing hoisting, KLSS, and Tensor Core optimizations to significantly boost the performance of fully homomorphic encryption workloads.

Cryptographic OptimizationFully Homomorphic EncryptionGPU computing

0 likes · 6 min read

FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

Architects' Tech Alliance

May 6, 2025 · Artificial Intelligence

Evolution of NVIDIA GPU Architectures for AI from Volta to Blackwell

The article reviews NVIDIA's GPU architecture progression—from Volta's pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell and Rubin designs—highlighting key innovations, performance gains for deep learning, and related resource updates for AI practitioners.

GPU architectureHigh‑Performance ComputingNVIDIA

0 likes · 9 min read

Evolution of NVIDIA GPU Architectures for AI from Volta to Blackwell

Architects' Tech Alliance

Mar 28, 2025 · Artificial Intelligence

Evolution of NVIDIA GPU Architectures for Deep Learning: From Volta to Blackwell and Rubin

The article traces NVIDIA’s GPU architecture evolution from the Volta era’s pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell and Rubin designs, highlighting key innovations such as mixed‑precision support, sparsity, NVLink, and their impact on deep‑learning performance.

AI hardwareGPUNVIDIA

0 likes · 10 min read

Evolution of NVIDIA GPU Architectures for Deep Learning: From Volta to Blackwell and Rubin

AntTech

Nov 16, 2024 · Information Security

WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores Accepted at HPCA 2025

Ant Group’s Computing Systems Lab announced that its GPU‑accelerated fully homomorphic encryption framework WarpDrive, which exploits Tensor and CUDA cores for high‑throughput NTT operations and parallel kernel designs, has been accepted as a paper at the IEEE HPCA 2025 conference.

CUDAFully Homomorphic EncryptionGPU

0 likes · 4 min read

WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores Accepted at HPCA 2025

Architects' Tech Alliance

Oct 15, 2024 · Artificial Intelligence

What Are the Core Metrics Behind AI Chips? A Deep Dive into GPU, ASIC, and TPU

This article explains the fundamental performance indicators of AI chips—TOPS, TFLOPS, and precision formats like FP16, FP32, and INT8—while comparing GPU, ASIC, and TPU architectures, highlighting Tensor Core advantages and TPU's superior efficiency over CPUs and GPUs.

AI chipASICFP16

0 likes · 4 min read

What Are the Core Metrics Behind AI Chips? A Deep Dive into GPU, ASIC, and TPU

Architects' Tech Alliance

Aug 8, 2024 · Artificial Intelligence

Fundamental Key Parameters of AI Chips: Compute Power, Precision Formats, and Architecture

This article explains the essential metrics of AI chips—including TOPS and TFLOPS compute, precision formats like FP16, FP32 and INT8, and the roles of GPUs, ASICs and TPUs—while highlighting how Tensor Cores boost deep‑learning performance and comparing TPU efficiency to CPUs and GPUs.

AI chipsASICFP16

0 likes · 4 min read

Fundamental Key Parameters of AI Chips: Compute Power, Precision Formats, and Architecture

Architects' Tech Alliance

Aug 21, 2023 · Artificial Intelligence

AI Compute Landscape: GPU Architectures, Tensor Cores, NVLink, and Scaling Challenges

The article surveys the AI compute ecosystem, explaining why CPUs are unsuitable for AI workloads, how heterogeneous CPU‑plus‑accelerator designs dominate, and detailing the evolution of NVIDIA GPUs, Tensor Cores, memory technologies, and inter‑GPU networking that enable large‑scale model training.

AI computeGPU clusteringNVLink

0 likes · 11 min read

AI Compute Landscape: GPU Architectures, Tensor Cores, NVLink, and Scaling Challenges

Baidu Geek Talk

Dec 27, 2022 · Artificial Intelligence

How to Supercharge AI Model Training: Bottlenecks and Cutting‑Edge Acceleration Techniques

This article systematically examines the major performance bottlenecks in AI model training, explains the underlying hardware and software causes, and presents a comprehensive set of acceleration strategies—including data‑loading optimizations, compute‑side enhancements, communication tricks, and the AIAK‑Training suite—backed by real‑world case studies and quantitative results.

AI trainingAIAK-TrainingGPU Acceleration

0 likes · 33 min read

How to Supercharge AI Model Training: Bottlenecks and Cutting‑Edge Acceleration Techniques

Baidu Intelligent Cloud Tech Hub

Dec 22, 2022 · Artificial Intelligence

How to Supercharge AI Model Training: Bottlenecks and Acceleration Techniques

This article systematically analyzes the main performance bottlenecks in AI model training, explains why acceleration is essential, and presents current hardware‑ and software‑based solutions—including data‑loading optimizations, operator fusion, mixed‑precision and Tensor Core usage, as well as distributed communication strategies—followed by real‑world case studies of Baidu's AIAK‑Training suite that demonstrate significant speed‑ups.

AI trainingGPU AccelerationPerformance Optimization

0 likes · 31 min read

How to Supercharge AI Model Training: Bottlenecks and Acceleration Techniques

Architects' Tech Alliance

Jul 4, 2022 · Industry Insights

Inside NVIDIA Hopper H100: Architecture, Performance, and AI Breakthroughs

The article provides a detailed technical analysis of NVIDIA's Hopper‑based H100 GPU, covering its 4 nm process, 800 billion transistors, GPC/TPC hierarchy, new FP8 Tensor Cores, Transformer engine, Tensor Memory Accelerator, and the resulting six‑fold performance jump over the previous A100 generation.

AI accelerationFP8GPU architecture

0 likes · 8 min read

Inside NVIDIA Hopper H100: Architecture, Performance, and AI Breakthroughs

Architects' Tech Alliance

Mar 20, 2021 · Fundamentals

Evolution of NVIDIA GPU Architectures from Fermi to Ampere

This article outlines the progression of NVIDIA GPU architectures—from the early Fermi and Kepler designs through Maxwell, Pascal, Volta, Turing, and the latest Ampere—detailing compute capabilities, SM structures, FP64/FP32 ratios, Tensor Core introductions, and their impact on AI and high‑performance computing.

AICUDAGPU architecture

0 likes · 19 min read

Evolution of NVIDIA GPU Architectures from Fermi to Ampere

Architects' Tech Alliance

Mar 15, 2021 · Artificial Intelligence

Evolution of NVIDIA GPU Architectures from Fermi to Ampere

This article provides a comprehensive overview of NVIDIA's GPU architecture evolution—covering Fermi, Kepler, Maxwell, Pascal, Volta, Turing, and Ampere—detailing compute capabilities, SM structures, specialized units such as Tensor Cores, and their impact on AI and high‑performance computing workloads.

AICUDAGPU

0 likes · 19 min read