Tagged articles
149 articles
Page 1 of 2
Architects' Tech Alliance
Architects' Tech Alliance
May 13, 2026 · Industry Insights

Inside Huawei Atlas 900A3 SuperPoD (CM384) Supernode Wiring Scheme

The article provides a detailed technical analysis of Huawei's Atlas 900A3 SuperPoD (CM384) supernode wiring architecture, covering cabinet composition, NPU/CPU counts, 400 G optical interconnect design, cable installation practices, scalability to 24 Pods, and the performance benefits for AI workloads.

400G optical interconnectAtlas 900A3Data center wiring
0 likes · 8 min read
Inside Huawei Atlas 900A3 SuperPoD (CM384) Supernode Wiring Scheme
Machine Heart
Machine Heart
May 11, 2026 · Artificial Intelligence

Can We Outsmart AI by Uploading Our Minds? MIT Dropout’s Plan for Digital Humans

Isaak Freeman, a former MIT PhD student, argues that humanity must embrace AI‑driven brain emulation—estimating that tens of thousands of H100 GPUs could simulate a human brain within a decade, but highlighting massive data‑acquisition, memory‑wall, and connectivity challenges that demand a multi‑decade, multi‑billion‑dollar effort.

AIHigh‑performance computingbrain emulation
0 likes · 8 min read
Can We Outsmart AI by Uploading Our Minds? MIT Dropout’s Plan for Digital Humans
Data Party THU
Data Party THU
May 2, 2026 · Artificial Intelligence

Training an 11.5 B‑parameter Universal Interatomic Potential in Hours on Exascale Supercomputers

A Chinese Academy of Sciences team introduced the MatRIS‑MoE model and the Janus training framework, enabling a 11.5 billion‑parameter universal machine‑learning interatomic potential to be trained on two exascale systems at 1.2 EFLOPS, compressing weeks‑long training into a few hours.

AI for ScienceExascale trainingHigh‑performance computing
0 likes · 8 min read
Training an 11.5 B‑parameter Universal Interatomic Potential in Hours on Exascale Supercomputers
Deepin Linux
Deepin Linux
Mar 6, 2026 · Backend Development

Unlocking Ultra‑Low Latency: How RDMA Transforms High‑Performance Networking

This article explains the fundamentals of Remote Direct Memory Access (RDMA), its low‑latency, zero‑copy and kernel‑bypass mechanisms, programming interfaces, and real‑world applications in data‑center networks, high‑performance computing, and distributed storage, providing developers with practical guidance and code examples.

High‑performance computingLow latencyNetwork programming
0 likes · 31 min read
Unlocking Ultra‑Low Latency: How RDMA Transforms High‑Performance Networking
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 17, 2025 · Cloud Native

AI Training Revives Gang Scheduling in Kubernetes for Elastic Resource Orchestration

The article examines how the rise of large‑model AI training reintroduces the need for gang scheduling in Kubernetes, contrasting the rigid resource requirements of HPC‑style workloads with cloud‑native elasticity, and outlines the historical evolution, current implementations, and future directions for achieving more flexible, high‑throughput compute orchestration.

AI trainingCloud NativeGang Scheduling
0 likes · 22 min read
AI Training Revives Gang Scheduling in Kubernetes for Elastic Resource Orchestration
AntTech
AntTech
Dec 4, 2025 · Artificial Intelligence

How AState Reduces Trillion‑Parameter RL Weight Sync to 6 Seconds

AState is a general‑purpose state data management system for reinforcement‑learning tasks that tackles low IO efficiency, slow weight synchronization, and state‑recovery challenges, achieving sub‑10‑second weight sync for trillion‑parameter models through a three‑layer architecture, zero‑redundancy transfers, and hardware‑aware co‑design, with the code openly available on GitHub.

AStateHigh‑performance computingWeight Synchronization
0 likes · 23 min read
How AState Reduces Trillion‑Parameter RL Weight Sync to 6 Seconds
AntTech
AntTech
Nov 21, 2025 · Artificial Intelligence

How Awex Enables Sub‑Second TB‑Scale Weight Sync for Trillion‑Parameter RL Models

Awex is a high‑performance Python framework that synchronizes training and inference weights for trillion‑parameter reinforcement‑learning models in seconds, using unified conversion, metadata management, and NCCL/RDMA transfer plans, dramatically reducing RL training latency and supporting diverse parallel strategies.

Distributed TrainingHigh‑performance computingPython
0 likes · 17 min read
How Awex Enables Sub‑Second TB‑Scale Weight Sync for Trillion‑Parameter RL Models
AI Cyberspace
AI Cyberspace
Nov 19, 2025 · Artificial Intelligence

Why MPI and NCCL Are Critical for Scaling AI Models Across Thousands of GPUs

This article explains how AI model training has evolved from single‑GPU workloads to massive distributed training using MPI for CPU‑centric communication and NCCL for GPU‑centric communication, covering their histories, core concepts, programming interfaces, topology discovery, protocol choices, and performance testing on multi‑GPU clusters.

AI distributed trainingGPU communicationHigh‑performance computing
0 likes · 71 min read
Why MPI and NCCL Are Critical for Scaling AI Models Across Thousands of GPUs
Kuaishou Tech
Kuaishou Tech
Nov 12, 2025 · Artificial Intelligence

How KaiFG Lets Python Feature Engineering Run at C++ Speed

KaiFG, Kuaishou's self‑built AI Feature Generator, unifies fragmented feature extraction frameworks, replaces slow C++ compilation cycles with Python‑level development, and achieves near‑C++ performance through Codon‑based compilation, reference‑counted memory management, and aggressive LLVM optimizations, dramatically shortening iteration time.

AI InfrastructureHigh‑performance computingfeature engineering
0 likes · 14 min read
How KaiFG Lets Python Feature Engineering Run at C++ Speed
Deepin Linux
Deepin Linux
Nov 11, 2025 · Fundamentals

Why RDMA Is the Secret to Lightning‑Fast Data Transfer in Modern Data Centers

This article explains the fundamentals of Remote Direct Memory Access (RDMA), its low‑latency, zero‑copy architecture, core principles, programming interfaces, and how it transforms data‑center networking, high‑performance computing, and distributed storage by bypassing the CPU and kernel.

High‑performance computingKernel BypassNetworking
0 likes · 30 min read
Why RDMA Is the Secret to Lightning‑Fast Data Transfer in Modern Data Centers
Architects' Tech Alliance
Architects' Tech Alliance
Nov 9, 2025 · Artificial Intelligence

Why Optical Interconnects Are the Next Bottleneck‑Breaker for Massive AI Clusters

This article systematically examines the demand, technology stack, and industry landscape of large‑scale AI compute clusters, highlighting the limitations of traditional copper interconnects and presenting device‑level and chip‑level optical interconnect solutions—including OCS, pluggable modules, silicon photonics, VCSEL, and micro‑LED—while outlining current challenges and future directions.

AI clustersData centerHigh‑performance computing
0 likes · 15 min read
Why Optical Interconnects Are the Next Bottleneck‑Breaker for Massive AI Clusters
Architects' Tech Alliance
Architects' Tech Alliance
Nov 6, 2025 · Artificial Intelligence

Inside scaleX640: How China’s First 640‑Card Supernode Redefines AI Compute

The scaleX640 supernode, unveiled at the Wuzhen World Internet Conference, packs 640 AI accelerators into a single rack, delivering unprecedented compute density, energy efficiency, open ecosystem compatibility, and reliability features that enable massive AI model training and inference at scale.

AI hardwareHigh‑performance computingenergy efficiency
0 likes · 4 min read
Inside scaleX640: How China’s First 640‑Card Supernode Redefines AI Compute
Open Source Linux
Open Source Linux
Nov 4, 2025 · Artificial Intelligence

Designing High‑Performance Networks for Large‑Scale AI Model Training

This article examines the challenges of building scalable, low‑latency, and cost‑effective network architectures—such as Clos/Fat‑Tree, Spine‑Leaf, Dragonfly, and Torus—for massive GPU clusters used in training trillion‑parameter AI models, comparing multi‑rail and single‑rail designs and highlighting real‑world implementations from Tencent and Alibaba.

AI trainingCLOSDragonfly
0 likes · 8 min read
Designing High‑Performance Networks for Large‑Scale AI Model Training
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Oct 21, 2025 · Cloud Computing

Essential Reading List for Mastering Modern Computing Systems

This curated reading list presents must‑read books covering cloud, edge, distributed, high‑performance, parallel, heterogeneous, quantum, and AI computing, offering expert editorial insights, author backgrounds, and publication details to help readers grasp core concepts and advance their technical expertise.

Book RecommendationsHigh‑performance computingParallel Programming
0 likes · 20 min read
Essential Reading List for Mastering Modern Computing Systems
Architects' Tech Alliance
Architects' Tech Alliance
Oct 15, 2025 · Fundamentals

Comparative Analysis of Leading E‑Level HPC Processors: A64FX, H100, MI250X, and PonteVecchio

This article compares four cutting‑edge high‑performance processors—Fujitsu A64FX, NVIDIA H100, AMD MI250X, and Intel PonteVecchio—examining their architectures, parallelism strategies, domain‑specific accelerators, supported data types, performance metrics, and power consumption to inform future E‑level computing designs.

AMD MI250XE-level computingFujitsu A64FX
0 likes · 10 min read
Comparative Analysis of Leading E‑Level HPC Processors: A64FX, H100, MI250X, and PonteVecchio
Programmer DD
Programmer DD
Oct 12, 2025 · Backend Development

Boost Java Performance: Integrate CUDA GPU Acceleration via JNI

This guide explains why Java struggles with high‑performance or data‑intensive workloads, introduces GPU acceleration with CUDA, compares integration options such as JCuda, JNI, and JNA, walks through a practical encryption use case with performance benchmarks, and provides production‑grade best practices for memory, threading, testing, security, and deployment.

CUDAGPUHigh‑performance computing
0 likes · 23 min read
Boost Java Performance: Integrate CUDA GPU Acceleration via JNI
Architects' Tech Alliance
Architects' Tech Alliance
Oct 11, 2025 · Artificial Intelligence

Why NVLink Beats PCIe for AI: Deep Dive into GPU Interconnect Technologies

This article examines the architectural differences between Scale‑Out and Scale‑Up networking, compares PCIe, NVLink, UALink, Infiniband and RoCE, and explains why high‑bandwidth, low‑latency GPU interconnects like NVLink are essential for modern AI and HPC workloads.

AI accelerationGPU interconnectHigh‑performance computing
0 likes · 27 min read
Why NVLink Beats PCIe for AI: Deep Dive into GPU Interconnect Technologies
Architects' Tech Alliance
Architects' Tech Alliance
Oct 9, 2025 · Artificial Intelligence

Unlocking AI Scale‑Up: Inside SUE, OISA, ALS and ETH+ High‑Performance Interconnects

This article introduces four cutting‑edge AI networking technologies—SUE, OISA, ALS, and ETH+—detailing their backgrounds, architectural designs, and performance enhancements that enable ultra‑high bandwidth, low‑latency, and scalable interconnects for modern AI compute clusters.

AI networkingHigh‑performance computingScale‑Up
0 likes · 13 min read
Unlocking AI Scale‑Up: Inside SUE, OISA, ALS and ETH+ High‑Performance Interconnects
Architects' Tech Alliance
Architects' Tech Alliance
Sep 29, 2025 · Artificial Intelligence

How NVLink and NVSwitch Power AI’s Next‑Gen High‑Performance Networks

This article, part of the 2025 AI Network Technology Whitepaper, classifies AI high‑performance networking into Scale‑Up, Scale‑Out, and frontier breakthroughs, then dives deep into NVLink’s evolution, technical features, NVSwitch’s full‑mesh architecture, and the newly opened NVLink Fusion ecosystem.

AI networkingGPU interconnectHigh‑performance computing
0 likes · 8 min read
How NVLink and NVSwitch Power AI’s Next‑Gen High‑Performance Networks
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Sep 26, 2025 · Artificial Intelligence

How Alibaba’s UPN512 Redefines AI Scale‑Up Networking with Optical Interconnects

The UPN512 whitepaper details Alibaba Cloud's next‑generation AI infrastructure network, explaining the shift from dense to MoE models, the rise of train‑and‑inference integration, xPU scale‑up challenges, and how high‑radix Ethernet with LPO/NPO optical interconnects delivers ultra‑high bandwidth, low latency, cost‑effective, and reliable large‑scale AI compute clusters.

AI InfrastructureHigh‑performance computingUPN512
0 likes · 34 min read
How Alibaba’s UPN512 Redefines AI Scale‑Up Networking with Optical Interconnects
Architects' Tech Alliance
Architects' Tech Alliance
Sep 22, 2025 · Artificial Intelligence

How Huawei’s New Atlas Supernodes Redefine AI Compute Power

Huawei’s 2025 Full‑Connection Conference unveiled the Atlas 950 and Atlas 960 SuperPoD supernodes, detailing their massive card counts, unprecedented compute, memory and bandwidth capabilities, and explaining how their full‑stack hardware‑software design dramatically accelerates large‑model AI training and inference.

AI supernodeAtlas 950Atlas 960
0 likes · 8 min read
How Huawei’s New Atlas Supernodes Redefine AI Compute Power
Architects' Tech Alliance
Architects' Tech Alliance
Sep 14, 2025 · Artificial Intelligence

Why Nvidia’s Blackwell GPUs Are Redefining AI Performance

The article analyzes Nvidia's 2023 Blackwell GPU series and GB200 NVL72 architecture, detailing their advanced 3‑4nm manufacturing, redesigned CUDA cores, next‑gen ray‑tracing and DLSS upgrades, massive compute and memory bandwidth gains, NVLink Gen5 improvements, and the diverse GB200 product configurations for high‑performance AI workloads.

AI accelerationBlackwell GPUGPU architecture
0 likes · 7 min read
Why Nvidia’s Blackwell GPUs Are Redefining AI Performance
Architects' Tech Alliance
Architects' Tech Alliance
Aug 15, 2025 · Artificial Intelligence

How AI Compute Centers Structure Their Networks for Maximum Performance

This article explains the logical and physical architecture of AI compute centers, detailing the division into access, security, network, management, out‑of‑band, AI compute cluster, and general compute zones, and describes the four network planes—parameter, sample, business, and management—required for high‑performance AI workloads.

AICompute clusterHigh‑performance computing
0 likes · 7 min read
How AI Compute Centers Structure Their Networks for Maximum Performance
Architects' Tech Alliance
Architects' Tech Alliance
Jul 23, 2025 · Artificial Intelligence

Why Do AI Large‑Model Training Clusters Need Specialized Network Topologies?

The article explains how AI large‑model training demands massive GPU resources and how carefully designed network architectures—such as Clos/Fat‑Tree, Spine‑Leaf, multi‑rail versus single‑rail connections, Dragonfly, and Torus—impact performance, scalability, cost, and reliability, guiding the selection of optimal data‑center networks.

AIData centerGPU clusters
0 likes · 9 min read
Why Do AI Large‑Model Training Clusters Need Specialized Network Topologies?
AntTech
AntTech
Jul 18, 2025 · Artificial Intelligence

Explore the 2025 CCF‑Ant Research Fund: 50 Cutting‑Edge Projects in AI, Security & Computing

The CCF‑Ant Research Fund 2025, now open for its first batch, invites global university and institute researchers to apply by August 25 2025 for up to 50 projects spanning data security, hardware‑software co‑design, supercomputing, and artificial intelligence, with detailed topics, eligibility rules, and submission channels provided.

High‑performance computingResearch Fundingdata security
0 likes · 11 min read
Explore the 2025 CCF‑Ant Research Fund: 50 Cutting‑Edge Projects in AI, Security & Computing
Architects' Tech Alliance
Architects' Tech Alliance
Jul 7, 2025 · Operations

Choosing the Right AI Data Center Network: InfiniBand vs RoCE

This article outlines the high‑performance networking requirements for AI data center training, compares InfiniBand and RoCE solutions, discusses their advantages in bandwidth, latency, scalability and cost, and provides design guidelines for building scalable, low‑latency, non‑blocking AI‑centric network architectures.

AIData centerHigh‑performance computing
0 likes · 10 min read
Choosing the Right AI Data Center Network: InfiniBand vs RoCE
Architects' Tech Alliance
Architects' Tech Alliance
Jul 3, 2025 · Fundamentals

How Supercomputers Evolved: From Early SGI Systems to China’s Exascale Machines

This article traces the evolution of global supercomputing—from early US and Japanese initiatives to Europe’s coordinated investments—and details China’s rapid development of successive supercomputer generations, highlighting landmark systems such as SGI Power Challenge XL, Dawning‑2000, DeepComp series, the “元” platform and the “东方” machine, as well as homegrown high‑performance software like HPSEPS and HPLES.

ChinaHardwareHigh‑performance computing
0 likes · 14 min read
How Supercomputers Evolved: From Early SGI Systems to China’s Exascale Machines
php Courses
php Courses
Jul 2, 2025 · Game Development

Why C++ Dominates Game Development, Systems, and High‑Performance Computing

From powering cutting‑edge 3A games and operating system kernels to accelerating scientific simulations, high‑frequency trading, and embedded IoT devices, C++ remains the go‑to language for high‑performance, low‑level control across diverse domains, thanks to its speed, portability, and fine‑grained memory management.

C++FinTechGame Development
0 likes · 6 min read
Why C++ Dominates Game Development, Systems, and High‑Performance Computing
Architects' Tech Alliance
Architects' Tech Alliance
Jun 29, 2025 · Artificial Intelligence

Scale-Up vs Scale-Out: Balancing Performance and Flexibility in AI Infrastructure

This article explains the technical definitions, core differences, and practical use cases of Scale‑Up and Scale‑Out networking in AI systems, highlighting how they impact latency, bandwidth, and cost, and illustrates their combined application through NVIDIA's NVL72 supernode case study.

AI InfrastructureGPU networkingHigh‑performance computing
0 likes · 14 min read
Scale-Up vs Scale-Out: Balancing Performance and Flexibility in AI Infrastructure
Architects' Tech Alliance
Architects' Tech Alliance
Jun 13, 2025 · Artificial Intelligence

How Huawei’s CloudMatrix 384 Challenges Nvidia’s AI Supercomputers

Huawei’s CloudMatrix 384, built from 384 Ascend 910C chips and a multi‑to‑multi topology, delivers up to 300 PFLOP BF16 performance—nearly twice that of Nvidia’s GB200 NVL72—while exposing supply‑chain dependencies on foreign fabs, higher power consumption, and a rapid push to scale China’s domestic semiconductor capabilities.

AI acceleratorAscend 910CCloudMatrix 384
0 likes · 12 min read
How Huawei’s CloudMatrix 384 Challenges Nvidia’s AI Supercomputers
Architects' Tech Alliance
Architects' Tech Alliance
Jun 10, 2025 · Fundamentals

Why RDMA Is Revolutionizing High‑Performance Computing and AI

This article explores how Remote Direct Memory Access (RDMA) technology transforms high‑performance computing, artificial intelligence, and cloud storage by eliminating data copies, bypassing the kernel, and offloading protocols to hardware, while reviewing key metrics, product ecosystems, real‑world use cases, challenges, and future trends.

DPUData Center NetworkingHigh‑performance computing
0 likes · 11 min read
Why RDMA Is Revolutionizing High‑Performance Computing and AI
Architects' Tech Alliance
Architects' Tech Alliance
Jun 9, 2025 · Artificial Intelligence

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

In March 2024 Nvidia unveiled the Blackwell GPU family and the GB200 NVL72 architecture, featuring 3‑4 nm processes, redesigned CUDA cores, next‑gen ray‑tracing, upgraded DLSS, massive FP16/FP8 compute gains, 8 TB/s memory bandwidth, and NVLink Gen5, while also presenting complex power, cooling, and packaging challenges for large‑scale AI deployments.

AI accelerationBlackwellGPU
0 likes · 6 min read
What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jun 9, 2025 · Artificial Intelligence

How to Build High‑Performance GEMM with NVIDIA CUTLASS

The article explains why standard GEMM libraries may fall short for special matrix shapes, introduces NVIDIA’s open‑source CUTLASS library, details its hierarchical tiling architecture, and walks through a complete device‑API example that customizes tile sizes and data layouts to achieve near‑hand‑written kernel performance on modern GPUs.

CUDACUTLASSGEMM
0 likes · 6 min read
How to Build High‑Performance GEMM with NVIDIA CUTLASS
Architects' Tech Alliance
Architects' Tech Alliance
Jun 3, 2025 · Artificial Intelligence

Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage

The article provides an in‑depth technical overview of Remote Direct Memory Access (RDMA), covering its zero‑copy, kernel‑bypass, and protocol‑offload features, hardware and software ecosystems, and its impact on high‑performance computing, artificial intelligence, cloud storage, finance, and edge computing.

Hardware accelerationHigh‑performance computingNetwork Protocols
0 likes · 10 min read
Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage
Architects' Tech Alliance
Architects' Tech Alliance
May 25, 2025 · Fundamentals

Comprehensive Overview of Shenwei (申威) Chip Development, Technology, Roadmap, and Applications

This article provides an in‑depth overview of Shenwei chips, covering their development history, core technical advantages such as a self‑designed instruction set and high‑performance computing capabilities, the current product line‑up, and their applications in supercomputing, cloud data centers, security, and embedded systems.

CPU architectureHigh‑performance computingServer processors
0 likes · 13 min read
Comprehensive Overview of Shenwei (申威) Chip Development, Technology, Roadmap, and Applications
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 23, 2025 · Artificial Intelligence

How Baidu’s Kunlun Supernode Redefines AI Compute Density and Performance

This article explains how Baidu’s Kunlun supernode, built on high‑density liquid‑cooled cabinets and a modular 1U 4‑card design, breaks traditional 8‑card limits, boosts compute density four‑fold, improves power and cooling efficiency, and provides a scalable foundation for large‑model AI training and inference.

AI InfrastructureGPU clusterHigh‑performance computing
0 likes · 13 min read
How Baidu’s Kunlun Supernode Redefines AI Compute Density and Performance
Architects' Tech Alliance
Architects' Tech Alliance
May 15, 2025 · Industry Insights

Why InfiniBand Still Beats Ethernet: Deep Dive into RDMA, Omni‑Path, and Protocol Layers

This article provides a comprehensive technical analysis of InfiniBand architecture, its protocol stack, comparison with Ethernet‑based RDMA solutions like RoCE and iWARP, and an overview of Omni‑Path, highlighting performance advantages, design trade‑offs, and practical limitations.

High‑performance computingInfiniBandOmni‑Path
0 likes · 19 min read
Why InfiniBand Still Beats Ethernet: Deep Dive into RDMA, Omni‑Path, and Protocol Layers
Baidu Geek Talk
Baidu Geek Talk
May 14, 2025 · Industry Insights

How RapidFS Boosts AI Model Training with 10 TiB/s Throughput

The article explains how large‑scale AI model training and inference require massive data handling, describes the RapidFS storage acceleration cluster deployed on a 30,000‑card Kunlun chip system with hundreds of domestic CPU servers, and presents performance tests showing linear throughput scaling up to over 1 TiB/s, demonstrating the impact of high‑performance storage on compute efficiency.

AI trainingHigh‑performance computingPerformance Testing
0 likes · 5 min read
How RapidFS Boosts AI Model Training with 10 TiB/s Throughput
Architects' Tech Alliance
Architects' Tech Alliance
May 8, 2025 · Industry Insights

How AI Storage Is Redefining Data‑Compute Synergy: Trends, Tech, and Roadmap

This article analyses the emergence of AI‑focused storage, detailing its ultra‑high bandwidth, concurrency, scale and low‑latency characteristics, the architectural shift from layered to fused designs, the specific performance and data‑management demands of training and inference, and a three‑phase roadmap for future storage innovations.

AI storageGPU AccelerationHigh‑performance computing
0 likes · 12 min read
How AI Storage Is Redefining Data‑Compute Synergy: Trends, Tech, and Roadmap
Architects' Tech Alliance
Architects' Tech Alliance
Apr 26, 2025 · Industry Insights

Can Huawei’s CloudMatrix 384 Outpace Nvidia’s GB200? A Deep Dive into China’s AI Supernode

The article provides a detailed technical analysis of Huawei's CloudMatrix 384 AI supernode—its 384 Ascend 910C chips, 300 PFLOP BF16 performance, massive memory and bandwidth, power consumption, scale‑up and scale‑out optical networking, and how it compares to Nvidia's GB200 NVL72 in architecture, cost, and energy efficiency.

AI hardwareCloudMatrixGPU cluster
0 likes · 12 min read
Can Huawei’s CloudMatrix 384 Outpace Nvidia’s GB200? A Deep Dive into China’s AI Supernode
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Apr 25, 2025 · Operations

How RapidFS Accelerates AI Model Training with 10 TiB/s Storage Performance

The article explains how RapidFS, a near‑compute storage acceleration solution built on BOS object storage, delivers up to 10 TiB/s throughput for massive AI model training, detailing its architecture, deployment on a 30,000‑card Kunlun cluster, and performance test results that show linear scaling from 20 to 70 nodes.

AI trainingHigh‑performance computingPerformance Testing
0 likes · 6 min read
How RapidFS Accelerates AI Model Training with 10 TiB/s Storage Performance
AntTech
AntTech
Apr 17, 2025 · Artificial Intelligence

Data+AI Forum at the 18th China Electronics Information Conference (2025) – Speaker Bios and Session Summaries

The 18th China Electronics Information Conference will be held in Chengdu from April 17‑21, 2025, featuring the DATA+AI forum that gathers leading academicians and industry experts to discuss data‑AI integration, with detailed speaker biographies, presentation titles, and abstracts covering topics such as large‑model inference, cloud‑edge ultrasound diagnostics, and the future of databases in the AI era.

@DataAIBig Data
0 likes · 12 min read
Data+AI Forum at the 18th China Electronics Information Conference (2025) – Speaker Bios and Session Summaries
Architects' Tech Alliance
Architects' Tech Alliance
Apr 8, 2025 · Artificial Intelligence

How NVSwitch Revolutionizes Multi‑GPU Interconnect for AI Workloads

This article examines NVIDIA's NVSwitch technology, explaining why it was needed, how it builds on NVLink to overcome PCIe bottlenecks, tracing its evolution from Pascal to the third‑generation design, and detailing its architectural features, scalability, full‑duplex bandwidth, non‑blocking communication, and optimized network topologies for high‑performance AI and HPC systems.

AI hardwareGPU interconnectHigh‑performance computing
0 likes · 9 min read
How NVSwitch Revolutionizes Multi‑GPU Interconnect for AI Workloads
Architects' Tech Alliance
Architects' Tech Alliance
Apr 6, 2025 · Fundamentals

PCIe vs NVLink: How Modern GPU Interconnects Power AI Training

As AI models grow to trillion‑parameter scales, training them demands massive GPU clusters whose performance is increasingly limited by network bandwidth; this article examines why traditional PCIe interconnects become bottlenecks and how NVIDIA's NVLink and NVSwitch technologies dramatically improve multi‑GPU communication and overall system efficiency.

AI trainingGPUHigh‑performance computing
0 likes · 12 min read
PCIe vs NVLink: How Modern GPU Interconnects Power AI Training
21CTO
21CTO
Mar 7, 2025 · Artificial Intelligence

Why France Named Its New Supercomputer After a Pioneering Female Engineer

France will christen its upcoming 2025 supercomputer after Alice Recoque, a trailblazing 1970s engineer, highlighting both the nation's high‑performance computing ambitions and a symbolic push for gender diversity in a traditionally male‑dominated field.

Alice RecoqueFrench technologyHigh‑performance computing
0 likes · 5 min read
Why France Named Its New Supercomputer After a Pioneering Female Engineer
DeWu Technology
DeWu Technology
Feb 26, 2025 · Backend Development

Migrating to Rust: A Case Study in High-Performance Computing

Migrating a Java computing layer to Rust yielded dramatic performance gains—30% lower CPU usage, 70% less memory—and greater stability, as the authors explain how Rust’s ownership, borrowing, lifetimes, and concurrency, combined with optimized data handling, FFI integration, Tokio async, Docker deployment, and monitoring, outweigh the steep learning curve and ecosystem gaps.

Backend DevelopmentFFIHigh‑performance computing
0 likes · 22 min read
Migrating to Rust: A Case Study in High-Performance Computing
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 20, 2025 · Cloud Computing

2024 Alibaba Cloud Infrastructure Network Team: AI‑Scale Network Innovations, Academic Achievements, Open‑Source Contributions and Industry Outreach

The 2024 report of Alibaba Cloud's Infrastructure Network team details AI‑driven network breakthroughs, high‑performance protocol stacks, large‑scale monitoring systems, numerous top‑conference paper acceptances, open‑source ecosystem initiatives, and extensive industry outreach, highlighting the evolving AI infra landscape.

AI InfrastructureConference PapersData Center Networking
0 likes · 19 min read
2024 Alibaba Cloud Infrastructure Network Team: AI‑Scale Network Innovations, Academic Achievements, Open‑Source Contributions and Industry Outreach
Deepin Linux
Deepin Linux
Dec 25, 2024 · Fundamentals

An Introduction to RDMA: Principles, Programming, and Applications

This article explains RDMA technology, covering its core principles, programming model with Verbs API, various communication modes, and its impact on data‑center networking, high‑performance computing, and distributed storage, highlighting its low‑latency, zero‑copy advantages over traditional TCP/IP.

Data centerHigh‑performance computingNetwork programming
0 likes · 30 min read
An Introduction to RDMA: Principles, Programming, and Applications
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 19, 2024 · Industry Insights

How China’s First Cloud HPC Standard Is Shaping the Future of High‑Performance Computing

The article explains how the newly approved national cloud‑HPC standard, co‑created by Alibaba Cloud and the China Electronics Standardization Institute, addresses resource limits, reduces costs, and guides industry adoption across sectors such as automotive, semiconductor design, and weather forecasting.

Alibaba CloudChinaHPC
0 likes · 4 min read
How China’s First Cloud HPC Standard Is Shaping the Future of High‑Performance Computing
Architects' Tech Alliance
Architects' Tech Alliance
Dec 11, 2024 · Fundamentals

Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained

This article breaks down the core components of high‑performance GPU servers—including PCIe switch chips, the evolution of NVLink from version 1.0 to 4.0, NVSwitch architecture, HBM memory tiers, and the nuances of bandwidth units—providing a comprehensive technical foundation for large‑scale model training.

GPU computingHBMHigh‑performance computing
0 likes · 10 min read
Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 12, 2024 · Industry Insights

How Cloud HPC Is Redefining Data+AI: Insights from Alibaba Cloud’s VP

In a keynote at CCF HPC China 2024, Alibaba Cloud’s VP explains how diversified high‑performance computing workloads, elastic cloud resources, and the proprietary CIPU architecture are driving the shift to a data‑plus‑AI era across industries such as automotive, life‑science, and large‑model training.

AICIPUHigh‑performance computing
0 likes · 9 min read
How Cloud HPC Is Redefining Data+AI: Insights from Alibaba Cloud’s VP
Architects' Tech Alliance
Architects' Tech Alliance
Nov 7, 2024 · Industry Insights

Why RDMA, InfiniBand, and RoCE Are Redefining High‑Performance Data Center Networks

This article examines the evolution from the OSI and TCP/IP models to RDMA‑based technologies, compares traditional three‑tier and leaf‑spine architectures, analyzes NVIDIA SuperPOD designs, and evaluates Ethernet, InfiniBand, and RoCE switches to guide high‑throughput, low‑latency data‑center networking decisions.

Data Center NetworkingHigh‑performance computingInfiniBand
0 likes · 13 min read
Why RDMA, InfiniBand, and RoCE Are Redefining High‑Performance Data Center Networks
Tencent Advertising Technology
Tencent Advertising Technology
Oct 14, 2024 · Artificial Intelligence

Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising

This paper presents the implementation and practice of generative retrieval based on Yuan large model in Tencent Advertising, addressing three key challenges: user intent capture, model alignment in advertising domain, and high-performance platform design under ROI constraints.

Generative RetrievalHigh‑performance computingModel Optimization
0 likes · 17 min read
Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising
Baidu Geek Talk
Baidu Geek Talk
Oct 9, 2024 · Artificial Intelligence

How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency

This article analyzes Baidu's Baige 4.0 AI infrastructure, detailing its four‑layer architecture, XMAN 5.0 hardware, HPN network, BCCL communication library, and AIAK inference upgrades, and explains how these innovations address large‑model training and inference challenges while boosting performance, utilization, and cost efficiency.

AI InfrastructureCluster ManagementGPU Acceleration
0 likes · 16 min read
How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency
Architects' Tech Alliance
Architects' Tech Alliance
Sep 2, 2024 · Industry Insights

Why Is the Global HPC Market Set to Surge to $437 Billion by 2028?

The report examines the 2023 global HPC market—covering on‑premise servers, cloud services, storage, compute engines, and interconnect technologies—showing total spending of $297 billion, forecasting growth to $437 billion by 2028, and highlighting key hardware trends, cloud adoption rates, and emerging AI‑driven workloads.

HPCHigh‑performance computinginterconnect
0 likes · 8 min read
Why Is the Global HPC Market Set to Surge to $437 Billion by 2028?
Architects' Tech Alliance
Architects' Tech Alliance
Aug 29, 2024 · Industry Insights

How NVIDIA Builds 256‑GPU and 576‑GPU SuperPods with H100, GH200, and GB200 Interconnects

The article analyzes NVIDIA's DGX SuperPOD architectures across three GPU generations—H100, GH200, and GB200—detailing their NVLink/NVSwitch topologies, bandwidth calculations, scalability limits, and the practical challenges of constructing 256‑GPU and 576‑GPU supercomputing clusters.

Data centerGPUHigh‑performance computing
0 likes · 11 min read
How NVIDIA Builds 256‑GPU and 576‑GPU SuperPods with H100, GH200, and GB200 Interconnects
ByteDance Data Platform
ByteDance Data Platform
Aug 27, 2024 · Artificial Intelligence

AI-Driven BI: Achieving Zero-Barrier Data Access and Smart Insights

This article traces the evolution of business intelligence platforms from early report‑centric tools to modern AI‑enhanced, search‑driven solutions, detailing the architectural layers, high‑performance data analysis design, multi‑level aggregation, hot‑cold data tiering, and large‑model applications that enable zero‑threshold data consumption and intelligent insights.

Business IntelligenceData AnalyticsHigh‑performance computing
0 likes · 18 min read
AI-Driven BI: Achieving Zero-Barrier Data Access and Smart Insights
Architects' Tech Alliance
Architects' Tech Alliance
Aug 13, 2024 · Fundamentals

Understanding High Bandwidth Memory (HBM): Architecture, Benefits, and Applications

High Bandwidth Memory (HBM) is a DRAM technology that uses stacked chips, TSV, and micro‑bump interconnects to deliver ultra‑high data rates, lower power consumption, and compact form factor, addressing the bandwidth, latency, power, space, thermal, and complexity challenges of traditional 2D memory in GPUs, AI, HPC, and data‑center workloads.

HBMHigh‑performance computingMemory Architecture
0 likes · 10 min read
Understanding High Bandwidth Memory (HBM): Architecture, Benefits, and Applications
Open Source Linux
Open Source Linux
Jul 23, 2024 · Fundamentals

Why Fat-Tree, Dragonfly, and Torus Topologies Dominate High‑Performance Computing Networks

High‑performance computing demands ultra‑low latency and massive scale, prompting a shift from traditional CLOS designs to alternative topologies such as Fat‑Tree, Dragonfly, and Torus, each offering distinct trade‑offs in bandwidth, scalability, routing complexity, and cost‑effectiveness for modern data‑center and HPC environments.

DragonflyFat-TreeHigh‑performance computing
0 likes · 10 min read
Why Fat-Tree, Dragonfly, and Torus Topologies Dominate High‑Performance Computing Networks
Architects' Tech Alliance
Architects' Tech Alliance
May 19, 2024 · Industry Insights

How to Build a 10,000‑GPU Supercluster: Core Design Principles and Architecture

This article analyzes the challenges and solutions for constructing a super‑large GPU training cluster, outlining five fundamental design principles, a four‑layer plus one‑domain architecture, and practical considerations for hardware, networking, and operational reliability in AI workloads.

AI trainingGPU clusterHigh‑performance computing
0 likes · 8 min read
How to Build a 10,000‑GPU Supercluster: Core Design Principles and Architecture
Architects' Tech Alliance
Architects' Tech Alliance
May 19, 2024 · Industry Insights

InfiniBand vs RoCEv2: Which High‑Performance Network Wins AI Compute?

With AI models growing to billions of parameters, the choice of high‑performance interconnect—InfiniBand or RoCEv2—directly impacts training speed, scalability, latency, and operational complexity, and this article analyzes their architectures, performance metrics, vendor ecosystems, and suitability for large‑scale AI clusters.

AIDistributed TrainingHigh‑performance computing
0 likes · 13 min read
InfiniBand vs RoCEv2: Which High‑Performance Network Wins AI Compute?
Architects' Tech Alliance
Architects' Tech Alliance
May 16, 2024 · Industry Insights

How to Build a Multi‑Petabyte AI Super‑Cluster: Scaling Beyond Ten‑Thousand GPUs

This article analyzes the architectural upgrades required for ultra‑large AI clusters, covering single‑GPU performance, super‑node scaling, DPU‑based heterogeneous computing, power‑efficiency, high‑throughput storage, and robust high‑speed networking to support trillion‑parameter model training and inference.

AIDPUGPU cluster
0 likes · 17 min read
How to Build a Multi‑Petabyte AI Super‑Cluster: Scaling Beyond Ten‑Thousand GPUs
Architects' Tech Alliance
Architects' Tech Alliance
May 15, 2024 · Artificial Intelligence

Detailed Overview of GPU Server Architectures: A100/A800 and H100/H800 Nodes

This article provides a comprehensive technical overview of large‑scale GPU server architectures, detailing the component topology of 8‑GPU A100/A800 and H100/H800 nodes, explaining storage network cards, NVSwitch interconnects, bandwidth calculations, and the trade‑offs between RoCEv2 and InfiniBand for AI workloads.

GPUHigh‑performance computingNVLink
0 likes · 13 min read
Detailed Overview of GPU Server Architectures: A100/A800 and H100/H800 Nodes
21CTO
21CTO
May 15, 2024 · Fundamentals

Why China Is Quietly Withdrawing from the Top500 Supercomputer Race

The latest Top500 ranking shows the United States dominating with the two fastest supercomputers, while China, despite having powerful hundred‑exaflop systems, has stopped reporting its machines, reflecting a strategic shift amid the tech cold war between the two nations.

ChinaHigh‑performance computingTOP500
0 likes · 3 min read
Why China Is Quietly Withdrawing from the Top500 Supercomputer Race
Architects' Tech Alliance
Architects' Tech Alliance
May 14, 2024 · Fundamentals

Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM

This article provides a comprehensive overview of the core components and terminology of large‑scale GPU computing, covering GPU server architecture, PCIe interconnects, NVLink generations, NVSwitch, high‑bandwidth memory (HBM), and bandwidth unit considerations for AI and HPC workloads.

AI hardwareGPU computingHBM
0 likes · 11 min read
Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM
Architects' Tech Alliance
Architects' Tech Alliance
Apr 28, 2024 · Industry Insights

Why RoCE v2 Is Outpacing InfiniBand for Modern Data Centers

This article provides an in‑depth technical analysis of RoCE v2, covering its architecture, NIC requirements, and detailed comparisons with InfiniBand across physical layers, protocol stacks, switching, congestion handling, routing, and topology, while also highlighting the UEC alliance’s new transport protocol initiative.

High‑performance computingInfiniBandRDMA
0 likes · 12 min read
Why RoCE v2 Is Outpacing InfiniBand for Modern Data Centers
Architects' Tech Alliance
Architects' Tech Alliance
Apr 16, 2024 · Industry Insights

Inside AI Servers: PCIe, NVLink, and NVSwitch Driving the Next‑Gen Compute

Based on TrendForce data, AI server shipments are projected to grow at a 12.2% CAGR through 2027, while advances in PCIe switching, retiming chips, and high‑speed GPU interconnects such as NVLink and NVSwitch are reshaping the architecture and performance of next‑generation AI compute platforms.

AI serversGPU interconnectHigh‑performance computing
0 likes · 11 min read
Inside AI Servers: PCIe, NVLink, and NVSwitch Driving the Next‑Gen Compute
Architects' Tech Alliance
Architects' Tech Alliance
Apr 10, 2024 · Industry Insights

Inside the GPU Server: Architecture of A100/A800 and H100/H800 Nodes

This article provides a detailed technical breakdown of modern multi‑GPU server nodes, covering component composition, storage network cards, NVSwitch interconnects, bandwidth calculations, and the architectural differences between NVIDIA A100/A800 and H100/H800 configurations for AI training workloads.

A100AI trainingGPU
0 likes · 12 min read
Inside the GPU Server: Architecture of A100/A800 and H100/H800 Nodes
Architects' Tech Alliance
Architects' Tech Alliance
Apr 8, 2024 · Fundamentals

Unlocking GPU Server Architecture: PCIe, NVLink, NVSwitch & HBM Explained

This article provides a comprehensive breakdown of high‑performance GPU server infrastructure, covering PCIe generations, NVLink evolution, NVSwitch and NVLink switches, HBM memory technologies, and bandwidth measurement units, helping readers understand the hardware connections and performance considerations essential for large‑scale model training.

GPU architectureHBMHigh‑performance computing
0 likes · 10 min read
Unlocking GPU Server Architecture: PCIe, NVLink, NVSwitch & HBM Explained
Architects' Tech Alliance
Architects' Tech Alliance
Mar 25, 2024 · Industry Insights

Why Fat-Tree, Dragonfly, and Torus Topologies Matter in HPC Networks

The article examines the challenges of ultra‑large‑scale HPC networking, compares traditional CLOS with Fat‑Tree, Dragonfly, and Torus topologies, explains their bandwidth and latency characteristics, presents scalability formulas, and evaluates routing algorithms and practical trade‑offs for each design.

Data centerDragonflyHigh‑performance computing
0 likes · 14 min read
Why Fat-Tree, Dragonfly, and Torus Topologies Matter in HPC Networks
DataFunTalk
DataFunTalk
Feb 18, 2024 · Cloud Computing

Research on the Unified Storage Platform for the Supercomputing Internet

This article presents a comprehensive overview of the challenges, key technologies, and future applications of a unified storage platform built on Alluxio for China's national supercomputing internet, detailing its architecture, data flow strategies, deployment status, and industry use cases across multiple sectors.

AlluxioData FlowHigh‑performance computing
0 likes · 13 min read
Research on the Unified Storage Platform for the Supercomputing Internet
Architects' Tech Alliance
Architects' Tech Alliance
Feb 14, 2024 · Industry Insights

Why InfiniBand Is Outpacing Ethernet in High‑Performance Computing

This article provides a comprehensive overview of InfiniBand technology, covering its history, architecture, packet format, layer functions, switching mechanisms, and performance advantages over Ethernet, while highlighting its rapid growth and future prospects in HPC environments.

ComparisonHigh‑performance computingInfiniBand
0 likes · 15 min read
Why InfiniBand Is Outpacing Ethernet in High‑Performance Computing
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Jan 25, 2024 · Backend Development

Cloud Music RTA Advertising and User Acquisition System: Architecture and Optimization Practices

NetEase Cloud Music’s RTA advertising system delivers real‑time, personalized ads at massive scale by using isolated Nginx clusters, layered decoupling, asynchronous Netty/Redis processing, and optimized storage with hash‑based key compression and Protostuff serialization, while supporting automated audience selection and in‑app attribution to boost user acquisition.

High‑performance computingRTA advertisingSystem Architecture
0 likes · 12 min read
Cloud Music RTA Advertising and User Acquisition System: Architecture and Optimization Practices
21CTO
21CTO
Sep 18, 2023 · Operations

China’s 1.5 Exaflops Oceanlite Supercomputer Chases the Gordon Bell Prize

The ACM announced that a paper based on China’s 1.5 exaflops Oceanlite supercomputer has been shortlisted for the 2023 Gordon Bell Prize, highlighting its novel turbulent‑flow code, the SW26010 Pro processor architecture, other global contenders, and geopolitical implications voiced by Jack Dongarra.

ExascaleGordon Bell PrizeHigh‑performance computing
0 likes · 15 min read
China’s 1.5 Exaflops Oceanlite Supercomputer Chases the Gordon Bell Prize
Architects' Tech Alliance
Architects' Tech Alliance
Jun 29, 2023 · Artificial Intelligence

Hyperion Research ISC23 HPC Market Update: Trends, Forecasts, and AI Impact

The Hyperion Research ISC23 HPC Market Update briefing highlights modest 2022‑2023 growth, forecasts global HPC spending reaching $33 billion in 2023 and $52 billion by 2026, outlines ten key 2023 predictions—including AI regulation, cloud‑driven HPC expansion, and emerging DPU/IPU markets—while emphasizing the continuing talent shortage and the strategic importance of AI across high‑performance computing.

AIHPCHigh‑performance computing
0 likes · 7 min read
Hyperion Research ISC23 HPC Market Update: Trends, Forecasts, and AI Impact
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jun 16, 2023 · Cloud Computing

Predictable Network and High‑Performance Network Architecture for Large‑Scale AI Training

The article examines how Alibaba Cloud’s Predictable Network, InfiniBand versus Ethernet trade‑offs, and the HPN high‑performance network design together address the extreme bandwidth, latency, scalability and reliability requirements of modern large‑model AI training workloads in cloud data centers.

AI trainingHigh‑performance computingInfiniBand
0 likes · 24 min read
Predictable Network and High‑Performance Network Architecture for Large‑Scale AI Training
Open Source Linux
Open Source Linux
Jun 13, 2023 · Fundamentals

Why RDMA Outperforms Traditional Networking: A Deep Dive into DMA

This article explains the fundamentals of Direct Memory Access (DMA) and Remote Direct Memory Access (RDMA), compares their data transfer mechanisms with traditional networking, and outlines RDMA's advantages, protocols, ecosystem, and real‑world adoption in high‑performance computing and data centers.

DMAHardwareHigh‑performance computing
0 likes · 13 min read
Why RDMA Outperforms Traditional Networking: A Deep Dive into DMA
Efficient Ops
Efficient Ops
Jun 11, 2023 · Artificial Intelligence

Why Network Bandwidth Is the Real Bottleneck for AIGC and How DDC Solves It

The article explains how AIGC models demand massive GPU compute, why network bandwidth and latency become the critical limiting factors, and how the Distributed Disaggregated Chassis (DDC) architecture addresses these challenges with scalable, high‑throughput networking solutions.

AI InfrastructureAIGCDDC
0 likes · 13 min read
Why Network Bandwidth Is the Real Bottleneck for AIGC and How DDC Solves It
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 19, 2023 · Cloud Computing

How DPU‑Powered Cloud IaaS Revolutionizes Compute, Networking, and Storage

Baidu Intelligent Cloud’s 2023 GTC presentation details how its DPU‑based IaaS architecture unifies high‑performance compute, networking, storage, and security, addressing rapid AI workload growth, reducing CPU bottlenecks, and delivering elastic, cost‑effective solutions across virtual machines, bare‑metal servers, and specialized RDMA instances.

DPUHigh‑performance computingIaaS
0 likes · 17 min read
How DPU‑Powered Cloud IaaS Revolutionizes Compute, Networking, and Storage
Baidu Tech Salon
Baidu Tech Salon
May 11, 2023 · Artificial Intelligence

Inside Baidu’s High‑Performance GPU Cluster: Powering the Next‑Gen AI Models

The article details Baidu's development of a massive high‑performance GPU/IB cluster, its architectural design, the challenges of training trillion‑parameter models, and how the integrated AI stack—spanning hardware, framework, and resource management—overcomes compute, memory, and communication bottlenecks to accelerate large‑model training.

AI InfrastructureBaidu AI BaseDistributed Training
0 likes · 17 min read
Inside Baidu’s High‑Performance GPU Cluster: Powering the Next‑Gen AI Models
Architects' Tech Alliance
Architects' Tech Alliance
Apr 17, 2023 · Fundamentals

Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models

This article provides a comprehensive overview of high‑performance computing, covering system architectures, hardware components, performance metrics, network topologies, common parallel file systems, cluster management functions, mainstream job‑scheduling systems, and MPI‑based parallel programming models.

ClusterHPCHigh‑performance computing
0 likes · 14 min read
Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models
Tencent Cloud Developer
Tencent Cloud Developer
Apr 14, 2023 · Artificial Intelligence

Tencent Cloud's Next-Generation HCC High-Performance Computing Cluster for Large Model Training

Tencent Cloud's new HCC high‑performance computing cluster triples previous generation performance with 3.2 TB/s server bandwidth, Xingsha servers and NVIDIA H800 GPUs delivering up to 1979 TFlops, while its Xingmai 3.2 T ETH RDMA network, TB‑level storage via COS + GooseFS, and multi‑form access (bare metal, cloud servers, containers, functions) enable efficient large‑model training.

AI computingGPU clusterHigh‑performance computing
0 likes · 9 min read
Tencent Cloud's Next-Generation HCC High-Performance Computing Cluster for Large Model Training
Open Source Linux
Open Source Linux
Apr 14, 2023 · Fundamentals

Why InfiniBand Is the Fastest Growing High‑Speed Interconnect for HPC

This article provides a comprehensive overview of InfiniBand technology, covering its history, architecture, packet structure, layer hierarchy, switching mechanisms, and performance advantages over Ethernet, highlighting its role as a low‑latency, high‑bandwidth solution for high‑performance computing.

High‑performance computingInfiniBandRDMA
0 likes · 14 min read
Why InfiniBand Is the Fastest Growing High‑Speed Interconnect for HPC
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 27, 2023 · Artificial Intelligence

How uGrapher Boosts GNN Performance 3.5× with a Unified Graph Operator Abstraction

Alibaba Cloud's PAI platform and Shanghai Jiao Tong University’s team announced their ASPLOS 2023‑accepted paper uGrapher, which unifies graph operator computation for GNNs, achieving up to 3.5× speedup over existing frameworks and paving the way for industrial‑scale acceleration.

ASPLOS 2023Alibaba Cloud PAIHigh‑performance computing
0 likes · 4 min read
How uGrapher Boosts GNN Performance 3.5× with a Unified Graph Operator Abstraction
Architects' Tech Alliance
Architects' Tech Alliance
Mar 26, 2023 · Fundamentals

Comprehensive Overview of InfiniBand Technology and Architecture

This article provides an in‑depth examination of InfiniBand, covering its rapid development as a high‑bandwidth, low‑latency interconnect technology, the InfiniBand Trade Association, detailed packet structures, layered architecture, switching mechanisms, and a comparative analysis with Ethernet, highlighting its advantages for high‑performance computing.

Data TransferHPCHigh‑performance computing
0 likes · 14 min read
Comprehensive Overview of InfiniBand Technology and Architecture
Architects' Tech Alliance
Architects' Tech Alliance
Mar 22, 2023 · Fundamentals

Overview of Huawei Kunpeng 920 Processor Architecture and Subsystems

The article provides a detailed technical overview of Huawei's Kunpeng 920 processor, describing its ARM‑based RISC architecture, chip organization, core and cache hierarchy, security features, IMU management, and the design of its I/O, interrupt, network, SAS, and PCIe subsystems.

ARMHigh‑performance computingKunpeng
0 likes · 10 min read
Overview of Huawei Kunpeng 920 Processor Architecture and Subsystems
Python Programming Learning Circle
Python Programming Learning Circle
Dec 17, 2022 · Fundamentals

Accelerating Python Code with Taichi: Prime Counting, LCS, and Reaction‑Diffusion Examples

This article demonstrates how importing the Taichi library into Python can dramatically accelerate compute‑intensive tasks, showcasing prime counting, longest common subsequence, and reaction‑diffusion simulations with speedups up to 120× and GPU support, while providing installation and usage guidance.

GPUHigh‑performance computingPython
0 likes · 6 min read
Accelerating Python Code with Taichi: Prime Counting, LCS, and Reaction‑Diffusion Examples
21CTO
21CTO
Dec 11, 2022 · Fundamentals

How Jack Dongarra’s Linpack Revolutionized Supercomputing and Earned a Turing Award

Jack Dongarra, a pioneering computer scientist, created the Linpack library and benchmark that enabled software to scale from laptops to exaflop supercomputers, earning him the 2022 ACM A.M. Turing Award and shaping modern high‑performance and cloud computing.

High‑performance computingJack DongarraLinpack
0 likes · 11 min read
How Jack Dongarra’s Linpack Revolutionized Supercomputing and Earned a Turing Award
Alimama Tech
Alimama Tech
Nov 2, 2022 · Artificial Intelligence

Optimizing GPU Utilization for Multimedia AI Services with high_service

The article presents high_service, a high‑performance inference framework that boosts GPU utilization in multimedia AI services by separating CPU‑heavy preprocessing from GPU inference, employing priority‑based auto‑scaling, multi‑tenant sharing, and TensorRT‑accelerated models to eliminate GIL bottlenecks, reduce waste, and adapt to fluctuating traffic, with future work targeting automated bottleneck detection and further CPU‑GPU offloading.

Auto ScalingGPU utilizationHigh‑performance computing
0 likes · 19 min read
Optimizing GPU Utilization for Multimedia AI Services with high_service
Architects' Tech Alliance
Architects' Tech Alliance
Oct 31, 2022 · Industry Insights

What Drives Distributed Storage: Product Forms, Ecosystem, and Key Use Cases

Distributed storage encompasses integrated appliances and pure‑software solutions, each with distinct hardware strategies, and forms a multi‑dimensional industry ecosystem that spans commercial and open‑source software, specialized and generic hardware, serving critical scenarios such as virtualization/cloud, high‑performance computing, and big‑data analytics.

Big DataHigh‑performance computingIndustry analysis
0 likes · 15 min read
What Drives Distributed Storage: Product Forms, Ecosystem, and Key Use Cases
phodal
phodal
Oct 24, 2022 · Industry Insights

Unlocking Ultra-Fast Systems: Key Patterns Behind Low‑Latency Architecture

This article provides a comprehensive overview of low‑latency architecture, covering network hardware, system‑level programming strategies, language choices, memory management techniques, event‑driven designs, high‑performance data structures, and visualization approaches for building ultra‑fast computing systems.

Event-Driven ArchitectureHigh‑performance computingJava performance
0 likes · 10 min read
Unlocking Ultra-Fast Systems: Key Patterns Behind Low‑Latency Architecture
Efficient Ops
Efficient Ops
Oct 7, 2022 · Fundamentals

What Is Computing Power? From ENIAC to the AI‑Driven Cloud Era

This article explains the concept of computing power, traces its evolution from early mechanical tools to modern cloud and AI accelerators, classifies its types, discusses measurement units, and examines current global trends and future challenges for this critical digital resource.

HardwareHigh‑performance computingcomputing power
0 likes · 13 min read
What Is Computing Power? From ENIAC to the AI‑Driven Cloud Era
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Sep 10, 2022 · Operations

Highlights of the 2022 ODCC Summit: Green Low‑Carbon Data Center Innovations and Technical Standards

The 2022 ODCC summit in Beijing showcased 55 innovative achievements across servers, networks, data‑center facilities, edge computing and intelligent monitoring, unveiling whitepapers, industry standards, OSSP specifications, liquid‑cooling guidelines, green data‑center practices, high‑performance networking and AI‑driven operations to promote low‑carbon, high‑efficiency computing infrastructure.

AIHigh‑performance computingcloud infrastructure
0 likes · 10 min read
Highlights of the 2022 ODCC Summit: Green Low‑Carbon Data Center Innovations and Technical Standards