Tagged articles
52 articles
Page 1 of 1
BirdNest Tech Talk
BirdNest Tech Talk
Oct 12, 2025 · Artificial Intelligence

What Happens When a Token Travels Through GPU Villages via RDMA and NVLink?

The article uses a whimsical journey to illustrate how token data is dispatched across GPU clusters—detailing functions like get_dispatch_layout, notify_dispatch, and combine_token, showing RDMA and NVLink pathways, performance experiments, and the final verification of token integrity.

AIDistributed SystemsGPU
0 likes · 5 min read
What Happens When a Token Travels Through GPU Villages via RDMA and NVLink?
Architects' Tech Alliance
Architects' Tech Alliance
Oct 11, 2025 · Artificial Intelligence

Why NVLink Beats PCIe for AI: Deep Dive into GPU Interconnect Technologies

This article examines the architectural differences between Scale‑Out and Scale‑Up networking, compares PCIe, NVLink, UALink, Infiniband and RoCE, and explains why high‑bandwidth, low‑latency GPU interconnects like NVLink are essential for modern AI and HPC workloads.

AI accelerationGPU interconnectHigh‑performance computing
0 likes · 27 min read
Why NVLink Beats PCIe for AI: Deep Dive into GPU Interconnect Technologies
Architects' Tech Alliance
Architects' Tech Alliance
Sep 29, 2025 · Artificial Intelligence

How NVLink and NVSwitch Power AI’s Next‑Gen High‑Performance Networks

This article, part of the 2025 AI Network Technology Whitepaper, classifies AI high‑performance networking into Scale‑Up, Scale‑Out, and frontier breakthroughs, then dives deep into NVLink’s evolution, technical features, NVSwitch’s full‑mesh architecture, and the newly opened NVLink Fusion ecosystem.

AI networkingGPU interconnectHigh‑performance computing
0 likes · 8 min read
How NVLink and NVSwitch Power AI’s Next‑Gen High‑Performance Networks
Architects' Tech Alliance
Architects' Tech Alliance
Sep 15, 2025 · Artificial Intelligence

Why NVLink Beats PCIe for AI Training: A Deep Dive into GPU Interconnects

This article examines the differences between Scale‑Out and Scale‑Up networking in AI compute clusters, comparing PCIe, Ethernet, InfiniBand, NVLink, UALink, and emerging standards like UB‑Mesh, and explains how each technology impacts bandwidth, latency, scalability, and cost for large‑scale model training.

AI trainingGPU interconnectNVLink
0 likes · 28 min read
Why NVLink Beats PCIe for AI Training: A Deep Dive into GPU Interconnects
Architects' Tech Alliance
Architects' Tech Alliance
Sep 14, 2025 · Artificial Intelligence

Why Nvidia’s Blackwell GPUs Are Redefining AI Performance

The article analyzes Nvidia's 2023 Blackwell GPU series and GB200 NVL72 architecture, detailing their advanced 3‑4nm manufacturing, redesigned CUDA cores, next‑gen ray‑tracing and DLSS upgrades, massive compute and memory bandwidth gains, NVLink Gen5 improvements, and the diverse GB200 product configurations for high‑performance AI workloads.

AI accelerationBlackwell GPUGPU architecture
0 likes · 7 min read
Why Nvidia’s Blackwell GPUs Are Redefining AI Performance
Architects' Tech Alliance
Architects' Tech Alliance
Aug 10, 2025 · Artificial Intelligence

From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning

This article traces the evolution of NVIDIA's GPU architectures—from Volta's pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell—highlighting key innovations such as mixed‑precision support, NVLink, and specialized Tensor Core designs that have dramatically boosted AI training and inference performance.

AI hardwareDeep LearningGPU architecture
0 likes · 10 min read
From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning
Architects' Tech Alliance
Architects' Tech Alliance
Jul 19, 2025 · Artificial Intelligence

Best GPU Cluster Network for Large‑Scale AI: NVLink, InfiniBand, RoCE & DDC

This article compares the main networking technologies used in large‑scale AI GPU clusters—NVLink, InfiniBand, RoCE Ethernet, and the emerging DDC full‑schedule fabric—examining latency, lossless transmission, congestion control, cost, power and scalability to help engineers choose the optimal solution for training massive language models.

AI trainingDDCData center
0 likes · 15 min read
Best GPU Cluster Network for Large‑Scale AI: NVLink, InfiniBand, RoCE & DDC
Instant Consumer Technology Team
Instant Consumer Technology Team
Jul 11, 2025 · Artificial Intelligence

Why NVLink Boosts Multi‑GPU Inference: Tensor Parallelism Explained

A recent migration of a multimodal image inference system from an internal network to a cloud environment revealed that NVLink bridges dramatically improve multi‑GPU inference speed by reducing inter‑GPU communication overhead, while tensor‑parallel and data‑parallel strategies each have distinct trade‑offs for model deployment.

AI PerformanceData ParallelGPU inference
0 likes · 11 min read
Why NVLink Boosts Multi‑GPU Inference: Tensor Parallelism Explained
Architects' Tech Alliance
Architects' Tech Alliance
May 26, 2025 · Artificial Intelligence

NVLink Fusion: NVIDIA’s High‑Bandwidth Interconnect for Heterogeneous AI Computing

NVLink Fusion, unveiled at Computex 2025, extends NVIDIA’s NVLink technology to enable high‑bandwidth, low‑latency connections between CPUs and GPUs or third‑party accelerators, offering up to 900 GB/s bandwidth, flexible heterogeneous configurations, ecosystem expansion, performance gains for AI training and inference, and potential cost reductions.

AICPUData center
0 likes · 12 min read
NVLink Fusion: NVIDIA’s High‑Bandwidth Interconnect for Heterogeneous AI Computing
Architects' Tech Alliance
Architects' Tech Alliance
Apr 28, 2025 · Artificial Intelligence

NVLink High‑Speed Interconnect: Architecture, Evolution, and Performance

NVLink, NVIDIA's high‑bandwidth interconnect introduced with the P100 GPU, replaces PCIe by offering significantly higher data rates and lower latency for GPU‑GPU and GPU‑CPU communication, and has evolved through multiple generations to support modern AI and high‑performance computing workloads.

AI accelerationGPU interconnectNVLink
0 likes · 9 min read
NVLink High‑Speed Interconnect: Architecture, Evolution, and Performance
Architects' Tech Alliance
Architects' Tech Alliance
Apr 8, 2025 · Artificial Intelligence

How NVSwitch Revolutionizes Multi‑GPU Interconnect for AI Workloads

This article examines NVIDIA's NVSwitch technology, explaining why it was needed, how it builds on NVLink to overcome PCIe bottlenecks, tracing its evolution from Pascal to the third‑generation design, and detailing its architectural features, scalability, full‑duplex bandwidth, non‑blocking communication, and optimized network topologies for high‑performance AI and HPC systems.

AI hardwareGPU interconnectHigh‑performance computing
0 likes · 9 min read
How NVSwitch Revolutionizes Multi‑GPU Interconnect for AI Workloads
Architects' Tech Alliance
Architects' Tech Alliance
Apr 6, 2025 · Fundamentals

PCIe vs NVLink: How Modern GPU Interconnects Power AI Training

As AI models grow to trillion‑parameter scales, training them demands massive GPU clusters whose performance is increasingly limited by network bandwidth; this article examines why traditional PCIe interconnects become bottlenecks and how NVIDIA's NVLink and NVSwitch technologies dramatically improve multi‑GPU communication and overall system efficiency.

AI trainingGPUHigh‑performance computing
0 likes · 12 min read
PCIe vs NVLink: How Modern GPU Interconnects Power AI Training
Architects' Tech Alliance
Architects' Tech Alliance
Apr 3, 2025 · Artificial Intelligence

Why NVLink and NVSwitch Are Essential for Training Massive AI Models

Training today's massive AI foundation models demands extensive GPU resources and sophisticated multi‑GPU communication, making technologies like NVLink and NVSwitch crucial for efficient distributed training, while data‑parallel and model‑parallel strategies together optimize performance across large‑scale hardware clusters.

AIDistributed TrainingGPU
0 likes · 8 min read
Why NVLink and NVSwitch Are Essential for Training Massive AI Models
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Mar 3, 2025 · Cloud Computing

How Baidu Cloud Optimizes GPU Servers for AI Workloads

This article explains the design and implementation of GPU cloud servers, covering data processing pipelines, hardware selection, topology, interconnect technologies, virtualization, multi‑GPU communication methods, and Baidu's practical solutions for both virtualized and bare‑metal instances to boost AI inference and training performance.

AIGPUNVLink
0 likes · 29 min read
How Baidu Cloud Optimizes GPU Servers for AI Workloads
AI Cyberspace
AI Cyberspace
Feb 8, 2025 · Artificial Intelligence

Why 8‑GPU Servers Are Essential for LLM Training and Which Interconnect Wins

With modern large‑language‑model workloads demanding massive parallelism, 8‑GPU servers have become the norm; this article explains the roles of CPUs, compares GPU‑to‑GPU interconnect options—including PCIe direct, PCIe Switch, NVLink, and NVSwitch—detailing their architectures, bandwidths, topologies, and trade‑offs for AI training.

8-GPU serverAI trainingGPU interconnect
0 likes · 14 min read
Why 8‑GPU Servers Are Essential for LLM Training and Which Interconnect Wins
Linux Kernel Journey
Linux Kernel Journey
Dec 22, 2024 · Artificial Intelligence

Understanding GPU Monitoring: Utilization Metrics and Failure Scenarios

This article systematically reviews GPU monitoring for large‑scale AI training, covering MFU/HFU definitions, key DCGM metrics, NVLink bandwidth, common failure codes such as Xid and SXid, experimental insights on T4 and H100 GPUs, and practical case studies for diagnosing and mitigating performance drops.

DCGMGPU failuresGPU monitoring
0 likes · 26 min read
Understanding GPU Monitoring: Utilization Metrics and Failure Scenarios
Architects' Tech Alliance
Architects' Tech Alliance
Dec 11, 2024 · Fundamentals

Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained

This article breaks down the core components of high‑performance GPU servers—including PCIe switch chips, the evolution of NVLink from version 1.0 to 4.0, NVSwitch architecture, HBM memory tiers, and the nuances of bandwidth units—providing a comprehensive technical foundation for large‑scale model training.

GPU computingHBMHigh‑performance computing
0 likes · 10 min read
Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained
Architects' Tech Alliance
Architects' Tech Alliance
Sep 3, 2024 · Industry Insights

How NVIDIA Grace Hopper Superchip Redefines HPC and AI Performance

The article provides an in‑depth technical overview of NVIDIA's Grace Hopper superchip, detailing its heterogeneous CPU‑GPU architecture, high‑bandwidth NVLink‑C2C interconnect, unified memory model, programming support, and system‑level scaling features that together deliver unprecedented performance for high‑performance computing and large‑scale AI workloads.

AIGrace HopperHPC
0 likes · 20 min read
How NVIDIA Grace Hopper Superchip Redefines HPC and AI Performance
Architects' Tech Alliance
Architects' Tech Alliance
Aug 29, 2024 · Industry Insights

How NVIDIA Builds 256‑GPU and 576‑GPU SuperPods with H100, GH200, and GB200 Interconnects

The article analyzes NVIDIA's DGX SuperPOD architectures across three GPU generations—H100, GH200, and GB200—detailing their NVLink/NVSwitch topologies, bandwidth calculations, scalability limits, and the practical challenges of constructing 256‑GPU and 576‑GPU supercomputing clusters.

Data centerGPUHigh‑performance computing
0 likes · 11 min read
How NVIDIA Builds 256‑GPU and 576‑GPU SuperPods with H100, GH200, and GB200 Interconnects
IT Services Circle
IT Services Circle
Jun 6, 2024 · Artificial Intelligence

Nvidia Unveils Blackwell GPU and AI Supercomputing Roadmap

Nvidia’s latest Blackwell GPU, presented by Jensen Huang, promises unprecedented performance and energy efficiency for large‑scale AI models, while the company also showcases accelerated computing, NVLink interconnects, AI‑optimized DGX servers, the NIM platform for rapid LLM deployment, and ambitious projects such as Earth‑2 digital twins and next‑generation embodied AI robots.

AIBlackwellGPU
0 likes · 18 min read
Nvidia Unveils Blackwell GPU and AI Supercomputing Roadmap
Architects' Tech Alliance
Architects' Tech Alliance
May 15, 2024 · Artificial Intelligence

Detailed Overview of GPU Server Architectures: A100/A800 and H100/H800 Nodes

This article provides a comprehensive technical overview of large‑scale GPU server architectures, detailing the component topology of 8‑GPU A100/A800 and H100/H800 nodes, explaining storage network cards, NVSwitch interconnects, bandwidth calculations, and the trade‑offs between RoCEv2 and InfiniBand for AI workloads.

GPUHigh‑performance computingNVLink
0 likes · 13 min read
Detailed Overview of GPU Server Architectures: A100/A800 and H100/H800 Nodes
Architects' Tech Alliance
Architects' Tech Alliance
May 14, 2024 · Fundamentals

Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM

This article provides a comprehensive overview of the core components and terminology of large‑scale GPU computing, covering GPU server architecture, PCIe interconnects, NVLink generations, NVSwitch, high‑bandwidth memory (HBM), and bandwidth unit considerations for AI and HPC workloads.

AI hardwareGPU computingHBM
0 likes · 11 min read
Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM
Architects' Tech Alliance
Architects' Tech Alliance
May 11, 2024 · Industry Insights

Why Network Interconnects Are the New Bottleneck for Large‑Model AI Training

The rapid growth of AI large‑model training and inference is driving unprecedented demand for compute and high‑speed networking, prompting a shift from traditional GPU clusters to super‑pooled intelligent computing centers that must balance multiple intra‑ and inter‑node interconnect solutions such as NVLink, OAM/UBB, InfiniBand and RoCEv2.

AIData centerInfiniBand
0 likes · 6 min read
Why Network Interconnects Are the New Bottleneck for Large‑Model AI Training
Architects' Tech Alliance
Architects' Tech Alliance
May 1, 2024 · Industry Insights

How NVIDIA’s Blackwell Platform Redefines AI Supercomputing Networks

The article examines NVIDIA’s Blackwell platform network architecture, detailing the fifth‑generation NVLink, sixth‑generation PCIe, 800 Gb/s InfiniBand and Ethernet adapters, the DGX B200 and GB200 configurations, new IB and Ethernet switches, and the implications of increased optical module demands for large‑scale AI clusters.

AI supercomputingBlackwellDGX
0 likes · 10 min read
How NVIDIA’s Blackwell Platform Redefines AI Supercomputing Networks
Architects' Tech Alliance
Architects' Tech Alliance
Apr 16, 2024 · Industry Insights

Inside AI Servers: PCIe, NVLink, and NVSwitch Driving the Next‑Gen Compute

Based on TrendForce data, AI server shipments are projected to grow at a 12.2% CAGR through 2027, while advances in PCIe switching, retiming chips, and high‑speed GPU interconnects such as NVLink and NVSwitch are reshaping the architecture and performance of next‑generation AI compute platforms.

AI serversGPU interconnectHigh‑performance computing
0 likes · 11 min read
Inside AI Servers: PCIe, NVLink, and NVSwitch Driving the Next‑Gen Compute
Architects' Tech Alliance
Architects' Tech Alliance
Apr 15, 2024 · Industry Insights

How NVIDIA NVLink is Transforming HPC and AI: Architecture, Switches, and Network Comparisons

This article provides an in‑depth technical analysis of NVIDIA NVLink, covering its evolution, the NVSwitch chip, NVLink‑enabled servers and switches, and a performance comparison with InfiniBand networks, highlighting its impact on high‑performance computing and artificial intelligence workloads.

GPU interconnectHPCNVLink
0 likes · 9 min read
How NVIDIA NVLink is Transforming HPC and AI: Architecture, Switches, and Network Comparisons
Architects' Tech Alliance
Architects' Tech Alliance
Apr 8, 2024 · Fundamentals

Unlocking GPU Server Architecture: PCIe, NVLink, NVSwitch & HBM Explained

This article provides a comprehensive breakdown of high‑performance GPU server infrastructure, covering PCIe generations, NVLink evolution, NVSwitch and NVLink switches, HBM memory technologies, and bandwidth measurement units, helping readers understand the hardware connections and performance considerations essential for large‑scale model training.

GPU architectureHBMHigh‑performance computing
0 likes · 10 min read
Unlocking GPU Server Architecture: PCIe, NVLink, NVSwitch & HBM Explained
Architects' Tech Alliance
Architects' Tech Alliance
Apr 2, 2024 · Artificial Intelligence

Evolution and Forecast of Nvidia NVLink, NVLink C2C, and B100/X100 GPU Architectures

The article analyses the historical evolution of Nvidia's NVLink and NVLink C2C interconnect technologies, compares them with PCIe, Ethernet and InfiniBand, and uses these trends to predict future AI‑chip architectures such as the B100 and X100 GPUs, highlighting design trade‑offs and packaging challenges.

AI ChipB100GPU architecture
0 likes · 15 min read
Evolution and Forecast of Nvidia NVLink, NVLink C2C, and B100/X100 GPU Architectures
Architects' Tech Alliance
Architects' Tech Alliance
Mar 31, 2024 · Industry Insights

How Many Optical Modules Do A100, H100, and GH200 AI Clusters Really Need?

This article analyzes the evolving data‑center network architectures for large AI clusters, detailing leaf‑spine and Fat‑Tree designs, NVLink interconnects, and calculating the precise optical‑module requirements for NVIDIA A100, H100, and GH200 deployments, while also comparing industry examples from Meta, AWS, and Google.

AI clustersFat-TreeNVLink
0 likes · 12 min read
How Many Optical Modules Do A100, H100, and GH200 AI Clusters Really Need?
Architects' Tech Alliance
Architects' Tech Alliance
Mar 18, 2024 · Industry Insights

Why Nvidia’s NVLink C2C Is Redefining GPU‑CPU Interconnects

The article provides an in‑depth technical analysis of Nvidia’s NVLink C2C interconnect, comparing its latency, bandwidth, power efficiency, density and cost against traditional SerDes solutions and examining its role in building SuperChip architectures with Grace CPUs and Hopper GPUs.

GPUNVLinkcost analysis
0 likes · 12 min read
Why Nvidia’s NVLink C2C Is Redefining GPU‑CPU Interconnects
Architects' Tech Alliance
Architects' Tech Alliance
Mar 12, 2024 · Industry Insights

What’s Nvidia’s 2024‑2025 AI Chip Roadmap? A Deep Dive into GPUs, CPUs, and Interconnects

The article analyzes Nvidia’s 2023 investor‑meeting roadmap, revealing an annual GPU release cadence with H200, B100 and X100 chips, a unified "One Architecture" strategy spanning x86 and ARM, accelerated interconnects like NVLink‑C2C, and competitive pressures shaping its AI ecosystem.

AI hardwareGPU roadmapIndustry analysis
0 likes · 20 min read
What’s Nvidia’s 2024‑2025 AI Chip Roadmap? A Deep Dive into GPUs, CPUs, and Interconnects
Architects' Tech Alliance
Architects' Tech Alliance
Dec 24, 2023 · Artificial Intelligence

Overview of Popular GPU/TPU Cluster Networking Technologies for LLM Training

This article examines the main GPU/TPU cluster networking options—including NVLink, InfiniBand, RoCE Ethernet Fabric, and DDC full‑schedule networks—explaining their latency, loss‑less transmission, congestion control, cost, scalability, and suitability for large‑scale LLM training workloads.

GPU networkingInfiniBandLLM training
0 likes · 18 min read
Overview of Popular GPU/TPU Cluster Networking Technologies for LLM Training
Architects' Tech Alliance
Architects' Tech Alliance
Aug 21, 2023 · Artificial Intelligence

AI Compute Landscape: GPU Architectures, Tensor Cores, NVLink, and Scaling Challenges

The article surveys the AI compute ecosystem, explaining why CPUs are unsuitable for AI workloads, how heterogeneous CPU‑plus‑accelerator designs dominate, and detailing the evolution of NVIDIA GPUs, Tensor Cores, memory technologies, and inter‑GPU networking that enable large‑scale model training.

AI computeGPU clusteringNVLink
0 likes · 11 min read
AI Compute Landscape: GPU Architectures, Tensor Cores, NVLink, and Scaling Challenges
Architects' Tech Alliance
Architects' Tech Alliance
Dec 30, 2020 · Artificial Intelligence

Understanding GPUs, AI Accelerators, and Market Trends

The article explains GPU evolution, its integration with CPUs, interconnect technologies like PCIe and NVLink, market shares of NVIDIA, AMD and Intel, AI accelerator types (GPU, FPGA, ASIC), and the roles of training and inference in cloud AI, while also promoting a paid 182‑page PPT resource.

AI acceleratorGPUHPC
0 likes · 7 min read
Understanding GPUs, AI Accelerators, and Market Trends
Architects' Tech Alliance
Architects' Tech Alliance
Oct 28, 2020 · Artificial Intelligence

Understanding NVIDIA NVLink: Architecture, Features, and Applications

The article introduces NVIDIA’s third‑generation NVLink technology, detailing its high‑bandwidth GPU‑GPU and GPU‑CPU interconnect, key architectural breakthroughs such as the Ampere‑based A100 GPU, multi‑instance GPU, and NVSwitch, and discusses its impact on AI, HPC, and graphics workloads.

GPU interconnectHigh-performance computingNVLink
0 likes · 7 min read
Understanding NVIDIA NVLink: Architecture, Features, and Applications
Architects' Tech Alliance
Architects' Tech Alliance
Feb 2, 2019 · Artificial Intelligence

An Overview of NVIDIA NVLink: Architecture, Topology, and Performance

This article explains NVIDIA's NVLink interconnect technology, covering its history, protocol layers, bandwidth advantages over PCIe, topologies such as the HGX-1/DGX-1 mesh, the NVSwitch extension, and performance gains for deep‑learning and high‑performance computing workloads.

AI accelerationGPU interconnectNVLink
0 likes · 7 min read
An Overview of NVIDIA NVLink: Architecture, Topology, and Performance
Architects' Tech Alliance
Architects' Tech Alliance
Feb 1, 2019 · Industry Insights

How GPUDirect P2P Boosts Multi‑GPU Performance and What Limits It in Virtualized Environments

This article explains the background of GPU communication, details NVIDIA's GPUDirect and its Peer‑to‑Peer features, discusses virtualization challenges, and presents performance measurements on an Alibaba Cloud GN5 instance showing latency reduction and near‑linear scaling for deep‑learning workloads.

Deep LearningGPU communicationGPUDirect
0 likes · 6 min read
How GPUDirect P2P Boosts Multi‑GPU Performance and What Limits It in Virtualized Environments