Tagged articles
24 articles
Page 1 of 1
Linux Kernel Journey
Linux Kernel Journey
Dec 7, 2025 · Fundamentals

CUDA Optimization Basics: Understanding GPU Architecture and Warp Scheduling

This article explains the fundamentals of CUDA performance tuning, covering GPU architectures from Kepler to Volta, the role of SMX, warp schedulers, registers and memory hierarchies, and provides practical guidance on launch configuration, latency hiding, and thread‑block sizing to maximize throughput.

CUDAGPU architecturePerformance Optimization
0 likes · 21 min read
CUDA Optimization Basics: Understanding GPU Architecture and Warp Scheduling
Architects' Tech Alliance
Architects' Tech Alliance
Oct 30, 2025 · Artificial Intelligence

How to Pick the Perfect Nvidia GPU for AI Servers – From Tesla to Hopper

This article traces the evolution of Nvidia’s GPU architectures—from the early Tesla series through Fermi, Kepler, Maxwell, Pascal, Volta, Turing, Ampere, and the latest Hopper—detailing their specifications, key features, and offering a systematic decision‑making guide for AI server designers to select the optimal GPU based on workload, model size, precision, scalability, and total cost of ownership.

AI serverGPU architectureGPU selection
0 likes · 16 min read
How to Pick the Perfect Nvidia GPU for AI Servers – From Tesla to Hopper
Tencent Cloud Developer
Tencent Cloud Developer
Sep 26, 2025 · Fundamentals

Why GPUs Really Matter: From Architecture Basics to CUDA Programming

This article explains why GPUs have become the preferred platform for high‑performance computing, covering Dennard scaling, GPU speed advantages, theoretical FLOPS calculations, CUDA programming examples like SAXPY, the SIMT execution model, instruction pipelines, and modern techniques for handling branch divergence and register bank conflicts.

CUDA programmingGPU architectureGPU performance
0 likes · 38 min read
Why GPUs Really Matter: From Architecture Basics to CUDA Programming
Architects' Tech Alliance
Architects' Tech Alliance
Sep 19, 2025 · Artificial Intelligence

Why Nvidia’s Rubin CPX GPU Could Revolutionize Long-Context AI Inference

Nvidia's Rubin CPX GPU, unveiled in September 2025, uses GDDR7 memory and a split‑stage architecture to dramatically boost token‑per‑second rates for long‑context inference, while its integration into third‑generation Oberon servers promises higher power density, improved ROI, and scalable data‑center deployments.

AI inferenceData centerGPU architecture
0 likes · 9 min read
Why Nvidia’s Rubin CPX GPU Could Revolutionize Long-Context AI Inference
Architects' Tech Alliance
Architects' Tech Alliance
Sep 14, 2025 · Artificial Intelligence

Why Nvidia’s Blackwell GPUs Are Redefining AI Performance

The article analyzes Nvidia's 2023 Blackwell GPU series and GB200 NVL72 architecture, detailing their advanced 3‑4nm manufacturing, redesigned CUDA cores, next‑gen ray‑tracing and DLSS upgrades, massive compute and memory bandwidth gains, NVLink Gen5 improvements, and the diverse GB200 product configurations for high‑performance AI workloads.

AI accelerationBlackwell GPUGPU architecture
0 likes · 7 min read
Why Nvidia’s Blackwell GPUs Are Redefining AI Performance
Refining Core Development Skills
Refining Core Development Skills
Aug 26, 2025 · Fundamentals

How NVIDIA’s Fermi Architecture Revolutionized GPU Computing: Key Improvements Explained

Fermi, NVIDIA’s 2010 GPU architecture, introduced major upgrades over the Tesla line—including a 40 nm process, vastly increased transistor count, GDDR5 memory, L2 cache, enhanced FP64 performance, ECC support, and unified CPU‑GPU addressing—making it the first truly complete GPU computing platform.

CUDA optimizationECC MemoryFP64 performance
0 likes · 12 min read
How NVIDIA’s Fermi Architecture Revolutionized GPU Computing: Key Improvements Explained
Architects' Tech Alliance
Architects' Tech Alliance
Aug 18, 2025 · Artificial Intelligence

How Large Model Training Dominates Compute and What New Techniques Can Change It

This article explains why pre‑training large AI models consumes 90‑99% of total compute, describes the full training and inference pipelines, introduces resource‑saving strategies such as PD‑separation, and reviews market trends and infrastructure challenges shaping the next generation of AI systems.

AI InfrastructureAI trainingGPU architecture
0 likes · 13 min read
How Large Model Training Dominates Compute and What New Techniques Can Change It
Architects' Tech Alliance
Architects' Tech Alliance
Aug 10, 2025 · Artificial Intelligence

From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning

This article traces the evolution of NVIDIA's GPU architectures—from Volta's pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell—highlighting key innovations such as mixed‑precision support, NVLink, and specialized Tensor Core designs that have dramatically boosted AI training and inference performance.

AI hardwareDeep LearningGPU architecture
0 likes · 10 min read
From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning
Refining Core Development Skills
Refining Core Development Skills
Aug 7, 2025 · Fundamentals

Why NVIDIA’s First Data‑Center GPU Revolutionized Computing: Inside the Tesla G80 Architecture

This article explains how NVIDIA transitioned from gaming graphics cards to general‑purpose GPUs with the first data‑center Tesla GPU, detailing the unified shader architecture, the internal components of TPCs and SMs, CUDA 1.0 programming basics, and performance calculations that illustrate the massive computational advantage over contemporary CPUs.

CUDAGPGPUGPU architecture
0 likes · 23 min read
Why NVIDIA’s First Data‑Center GPU Revolutionized Computing: Inside the Tesla G80 Architecture
Open Source Linux
Open Source Linux
Jul 5, 2025 · Fundamentals

Why Nvidia’s Blackwell GPU Beats AMD RDNA4: Deep Dive into GB202 Architecture

This article examines Nvidia's massive GB202 Blackwell GPU—its 750 mm² die, 922 billion transistors, 192 SMs, and extensive memory subsystem—while comparing its compute units, instruction caches, atomics, and bandwidth against AMD's RDNA4‑based RX 9070, highlighting architectural trade‑offs, performance metrics, and future GPU competition.

AMD RDNA4GB202GPU architecture
0 likes · 20 min read
Why Nvidia’s Blackwell GPU Beats AMD RDNA4: Deep Dive into GB202 Architecture
Architects' Tech Alliance
Architects' Tech Alliance
Jun 9, 2025 · Artificial Intelligence

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

In March 2024 Nvidia unveiled the Blackwell GPU family and the GB200 NVL72 architecture, featuring 3‑4 nm processes, redesigned CUDA cores, next‑gen ray‑tracing, upgraded DLSS, massive FP16/FP8 compute gains, 8 TB/s memory bandwidth, and NVLink Gen5, while also presenting complex power, cooling, and packaging challenges for large‑scale AI deployments.

AI accelerationBlackwellGPU
0 likes · 6 min read
What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?
Architects' Tech Alliance
Architects' Tech Alliance
May 6, 2025 · Artificial Intelligence

Evolution of NVIDIA GPU Architectures for AI from Volta to Blackwell

The article reviews NVIDIA's GPU architecture progression—from Volta's pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell and Rubin designs—highlighting key innovations, performance gains for deep learning, and related resource updates for AI practitioners.

GPU architectureHigh‑Performance ComputingNvidia
0 likes · 9 min read
Evolution of NVIDIA GPU Architectures for AI from Volta to Blackwell
Infra Learning Club
Infra Learning Club
Mar 18, 2025 · Fundamentals

Can You Direct a CUDA Kernel to a Specific SM?

The article explains CUDA’s architecture and SM basics, describes how the warp scheduler and dispatch units assign thread blocks to SMs, and concludes that external control cannot target a specific SM, while mentioning the NanoFlow intra‑device parallelism approach as a possible indirect optimization.

CUDAGPU architectureKernel Scheduling
0 likes · 7 min read
Can You Direct a CUDA Kernel to a Specific SM?
Architects' Tech Alliance
Architects' Tech Alliance
Apr 8, 2024 · Fundamentals

Unlocking GPU Server Architecture: PCIe, NVLink, NVSwitch & HBM Explained

This article provides a comprehensive breakdown of high‑performance GPU server infrastructure, covering PCIe generations, NVLink evolution, NVSwitch and NVLink switches, HBM memory technologies, and bandwidth measurement units, helping readers understand the hardware connections and performance considerations essential for large‑scale model training.

GPU architectureHBMHigh‑performance computing
0 likes · 10 min read
Unlocking GPU Server Architecture: PCIe, NVLink, NVSwitch & HBM Explained
Architects' Tech Alliance
Architects' Tech Alliance
Apr 2, 2024 · Artificial Intelligence

Evolution and Forecast of Nvidia NVLink, NVLink C2C, and B100/X100 GPU Architectures

The article analyses the historical evolution of Nvidia's NVLink and NVLink C2C interconnect technologies, compares them with PCIe, Ethernet and InfiniBand, and uses these trends to predict future AI‑chip architectures such as the B100 and X100 GPUs, highlighting design trade‑offs and packaging challenges.

AI ChipB100GPU architecture
0 likes · 15 min read
Evolution and Forecast of Nvidia NVLink, NVLink C2C, and B100/X100 GPU Architectures
Architects' Tech Alliance
Architects' Tech Alliance
Feb 22, 2023 · Industry Insights

RDNA 2 vs Nvidia Ampere: Architecture, Cache, and Game Performance

This article provides an in‑depth technical analysis of AMD’s RDNA 2 GPU architecture, comparing its compute units, cache hierarchy, latency and bandwidth characteristics with Nvidia’s Ampere, and evaluates real‑world game performance in titles such as Cyberpunk 2077, Titanic Honor & Glory, and Gunner HEAT PC.

AMDGPU architectureRDNA 2
0 likes · 30 min read
RDNA 2 vs Nvidia Ampere: Architecture, Cache, and Game Performance
Architects' Tech Alliance
Architects' Tech Alliance
Mar 20, 2021 · Fundamentals

Evolution of NVIDIA GPU Architectures from Fermi to Ampere

This article outlines the progression of NVIDIA GPU architectures—from the early Fermi and Kepler designs through Maxwell, Pascal, Volta, Turing, and the latest Ampere—detailing compute capabilities, SM structures, FP64/FP32 ratios, Tensor Core introductions, and their impact on AI and high‑performance computing.

AICUDAGPU architecture
0 likes · 19 min read
Evolution of NVIDIA GPU Architectures from Fermi to Ampere
TAL Education Technology
TAL Education Technology
May 14, 2020 · Artificial Intelligence

An Introduction to GPU Computing and CUDA Architecture

This article provides a concise overview of GPU computing fundamentals, covering GPU hardware components, memory hierarchy, parallel execution models, and the CUDA programming framework, illustrating how CPUs and GPUs cooperate in heterogeneous computing environments.

CUDACUDA programmingGPU
0 likes · 16 min read
An Introduction to GPU Computing and CUDA Architecture
Architects' Tech Alliance
Architects' Tech Alliance
Jul 2, 2017 · Fundamentals

Differences Between NVIDIA Tesla and GeForce GPUs: Architecture, Performance, and Use Cases

This article compares NVIDIA's Tesla and GeForce GPU families, detailing their target markets, design differences, core architectures, double‑precision performance, ECC support, memory bandwidth, interface options, software and OS compatibility, power efficiency, and management features to help readers choose the right GPU for HPC or gaming workloads.

GPUGPU architectureGeForce
0 likes · 11 min read
Differences Between NVIDIA Tesla and GeForce GPUs: Architecture, Performance, and Use Cases