Tagged articles

High-performance computing

153 articles · Page 1 of 2

Jun 29, 2026 · Industry Insights

How China's ParaStor F9000 Became the First Domestic Storage to Top Global IO500 Rankings

China’s Sugon ParaStor F9000 all‑flash distributed storage system has claimed the top spots on both the full‑node and 10‑node IO500 production benchmarks, delivering up to 247% bandwidth gains and powering large‑scale AI, scientific simulation, and autonomous‑driving workloads across a ten‑thousand‑card supercomputing cluster.

AI workloadsAll-flash storageChinese hardware

0 likes · 9 min read

How China's ParaStor F9000 Became the First Domestic Storage to Top Global IO500 Rankings

ITPUB

Jun 24, 2026 · Industry Insights

How a Zero‑GPU, All‑CPU Supercomputer Defied the GPU Trend to Claim World #1

The Ling Sheng supercomputer topped the TOP500 with a 2.19 EFLOPS all‑CPU design, eschewing GPUs, achieving 84.4% scaling efficiency across millions of cores, high energy efficiency, and demonstrating a full‑stack domestic alternative that reshapes the global high‑performance computing landscape.

AI accelerationCPU architectureHigh-performance computing

0 likes · 6 min read

How a Zero‑GPU, All‑CPU Supercomputer Defied the GPU Trend to Claim World #1

Architects' Tech Alliance

May 31, 2026 · Industry Insights

Huawei AI Data Center Reference Design – Downloadable Blueprint

The Huawei AI Data Center Reference Design offers a standardized, integrated, high‑performance compute infrastructure for large‑model training and inference, built on GB/T 50174, featuring modular GPU/HBM servers, 20–50 kW per rack, leaf‑spine 100/200/400 Gbps networking, liquid cooling, redundant power, and intelligent management, with a downloadable package for replication.

AIData CenterGPU

0 likes · 4 min read

Huawei AI Data Center Reference Design – Downloadable Blueprint

Architects' Tech Alliance

May 24, 2026 · Artificial Intelligence

AI Supernodes: How Hundreds of Chips Merge into a Single High‑Performance Compute Unit

The article explains what AI supernodes are, how they differ from traditional server clusters, and why their bus‑level interconnect, global memory pooling, peer‑to‑peer compute and integrated liquid‑cooled racks deliver up to 15× bandwidth gains, 4× inference concurrency, and significant cost reductions, while comparing the approaches of Nvidia, Huawei and other Chinese vendors and outlining future scaling challenges.

AI SupernodeHigh-performance computingHuawei

0 likes · 9 min read

AI Supernodes: How Hundreds of Chips Merge into a Single High‑Performance Compute Unit

Architects' Tech Alliance

May 13, 2026 · Industry Insights

Inside Huawei Atlas 900A3 SuperPoD (CM384) Supernode Wiring Scheme

The article provides a detailed technical analysis of Huawei's Atlas 900A3 SuperPoD (CM384) supernode wiring architecture, covering cabinet composition, NPU/CPU counts, 400 G optical interconnect design, cable installation practices, scalability to 24 Pods, and the performance benefits for AI workloads.

400G optical interconnectAtlas 900A3Data center wiring

0 likes · 8 min read

Inside Huawei Atlas 900A3 SuperPoD (CM384) Supernode Wiring Scheme

Machine Heart

May 11, 2026 · Artificial Intelligence

Can We Outsmart AI by Uploading Our Minds? MIT Dropout’s Plan for Digital Humans

Isaak Freeman, a former MIT PhD student, argues that humanity must embrace AI‑driven brain emulation—estimating that tens of thousands of H100 GPUs could simulate a human brain within a decade, but highlighting massive data‑acquisition, memory‑wall, and connectivity challenges that demand a multi‑decade, multi‑billion‑dollar effort.

AIHigh-performance computingNeuroscience

0 likes · 8 min read

Can We Outsmart AI by Uploading Our Minds? MIT Dropout’s Plan for Digital Humans

Data Party THU

May 2, 2026 · Artificial Intelligence

Training an 11.5 B‑parameter Universal Interatomic Potential in Hours on Exascale Supercomputers

A Chinese Academy of Sciences team introduced the MatRIS‑MoE model and the Janus training framework, enabling a 11.5 billion‑parameter universal machine‑learning interatomic potential to be trained on two exascale systems at 1.2 EFLOPS, compressing weeks‑long training into a few hours.

AI for ScienceExascale trainingHigh-performance computing

0 likes · 8 min read

Training an 11.5 B‑parameter Universal Interatomic Potential in Hours on Exascale Supercomputers

Deepin Linux

Mar 6, 2026 · Backend Development

Unlocking Ultra‑Low Latency: How RDMA Transforms High‑Performance Networking

This article explains the fundamentals of Remote Direct Memory Access (RDMA), its low‑latency, zero‑copy and kernel‑bypass mechanisms, programming interfaces, and real‑world applications in data‑center networks, high‑performance computing, and distributed storage, providing developers with practical guidance and code examples.

Distributed storageHigh-performance computingNetwork Programming

0 likes · 31 min read

Unlocking Ultra‑Low Latency: How RDMA Transforms High‑Performance Networking

Alibaba Cloud Infrastructure

Dec 17, 2025 · Cloud Native

AI Training Revives Gang Scheduling in Kubernetes for Elastic Resource Orchestration

The article examines how the rise of large‑model AI training reintroduces the need for gang scheduling in Kubernetes, contrasting the rigid resource requirements of HPC‑style workloads with cloud‑native elasticity, and outlines the historical evolution, current implementations, and future directions for achieving more flexible, high‑throughput compute orchestration.

AI trainingCloud NativeGang Scheduling

0 likes · 22 min read

AI Training Revives Gang Scheduling in Kubernetes for Elastic Resource Orchestration

AntTech

Dec 4, 2025 · Artificial Intelligence

How AState Reduces Trillion‑Parameter RL Weight Sync to 6 Seconds

AState is a general‑purpose state data management system for reinforcement‑learning tasks that tackles low IO efficiency, slow weight synchronization, and state‑recovery challenges, achieving sub‑10‑second weight sync for trillion‑parameter models through a three‑layer architecture, zero‑redundancy transfers, and hardware‑aware co‑design, with the code openly available on GitHub.

AStateHigh-performance computinglarge models

0 likes · 23 min read

How AState Reduces Trillion‑Parameter RL Weight Sync to 6 Seconds

AntTech

Nov 21, 2025 · Artificial Intelligence

How Awex Enables Sub‑Second TB‑Scale Weight Sync for Trillion‑Parameter RL Models

Awex is a high‑performance Python framework that synchronizes training and inference weights for trillion‑parameter reinforcement‑learning models in seconds, using unified conversion, metadata management, and NCCL/RDMA transfer plans, dramatically reducing RL training latency and supporting diverse parallel strategies.

High-performance computingPythondistributed training

0 likes · 17 min read

How Awex Enables Sub‑Second TB‑Scale Weight Sync for Trillion‑Parameter RL Models

AI Cyberspace

Nov 19, 2025 · Artificial Intelligence

Why MPI and NCCL Are Critical for Scaling AI Models Across Thousands of GPUs

This article explains how AI model training has evolved from single‑GPU workloads to massive distributed training using MPI for CPU‑centric communication and NCCL for GPU‑centric communication, covering their histories, core concepts, programming interfaces, topology discovery, protocol choices, and performance testing on multi‑GPU clusters.

AI distributed trainingGPU communicationHigh-performance computing

0 likes · 71 min read

Why MPI and NCCL Are Critical for Scaling AI Models Across Thousands of GPUs

Kuaishou Tech

Nov 12, 2025 · Artificial Intelligence

How KaiFG Lets Python Feature Engineering Run at C++ Speed

KaiFG, Kuaishou's self‑built AI Feature Generator, unifies fragmented feature extraction frameworks, replaces slow C++ compilation cycles with Python‑level development, and achieves near‑C++ performance through Codon‑based compilation, reference‑counted memory management, and aggressive LLVM optimizations, dramatically shortening iteration time.

AI InfrastructureHigh-performance computingfeature engineering

0 likes · 14 min read

How KaiFG Lets Python Feature Engineering Run at C++ Speed

Deepin Linux

Nov 11, 2025 · Fundamentals

Why RDMA Is the Secret to Lightning‑Fast Data Transfer in Modern Data Centers

This article explains the fundamentals of Remote Direct Memory Access (RDMA), its low‑latency, zero‑copy architecture, core principles, programming interfaces, and how it transforms data‑center networking, high‑performance computing, and distributed storage by bypassing the CPU and kernel.

High-performance computingKernel BypassRDMA

0 likes · 30 min read

Why RDMA Is the Secret to Lightning‑Fast Data Transfer in Modern Data Centers

Architects' Tech Alliance

Nov 9, 2025 · Artificial Intelligence

Why Optical Interconnects Are the Next Bottleneck‑Breaker for Massive AI Clusters

This article systematically examines the demand, technology stack, and industry landscape of large‑scale AI compute clusters, highlighting the limitations of traditional copper interconnects and presenting device‑level and chip‑level optical interconnect solutions—including OCS, pluggable modules, silicon photonics, VCSEL, and micro‑LED—while outlining current challenges and future directions.

AI clustersData CenterHigh-performance computing

0 likes · 15 min read

Why Optical Interconnects Are the Next Bottleneck‑Breaker for Massive AI Clusters

Architects' Tech Alliance

Nov 6, 2025 · Artificial Intelligence

Inside scaleX640: How China’s First 640‑Card Supernode Redefines AI Compute

The scaleX640 supernode, unveiled at the Wuzhen World Internet Conference, packs 640 AI accelerators into a single rack, delivering unprecedented compute density, energy efficiency, open ecosystem compatibility, and reliability features that enable massive AI model training and inference at scale.

AI hardwareHigh-performance computingenergy efficiency

0 likes · 4 min read

Inside scaleX640: How China’s First 640‑Card Supernode Redefines AI Compute

Open Source Linux

Nov 4, 2025 · Artificial Intelligence

Designing High‑Performance Networks for Large‑Scale AI Model Training

This article examines the challenges of building scalable, low‑latency, and cost‑effective network architectures—such as Clos/Fat‑Tree, Spine‑Leaf, Dragonfly, and Torus—for massive GPU clusters used in training trillion‑parameter AI models, comparing multi‑rail and single‑rail designs and highlighting real‑world implementations from Tencent and Alibaba.

AI trainingCLOSDragonfly

0 likes · 8 min read

Designing High‑Performance Networks for Large‑Scale AI Model Training

Huawei Cloud Developer Alliance

Oct 21, 2025 · Cloud Computing

Essential Reading List for Mastering Modern Computing Systems

This curated reading list presents must‑read books covering cloud, edge, distributed, high‑performance, parallel, heterogeneous, quantum, and AI computing, offering expert editorial insights, author backgrounds, and publication details to help readers grasp core concepts and advance their technical expertise.

Book RecommendationsCloud ComputingHigh-performance computing

0 likes · 20 min read

Essential Reading List for Mastering Modern Computing Systems

Architects' Tech Alliance

Oct 15, 2025 · Fundamentals

Understanding High‑Performance Computing: Principles, FLOPS, and Future Limits

This article explains the fundamentals of high‑performance computing (HPC), covering serial and parallel processing, the roles of CPUs and GPUs, system architectures, FLOPS metrics, current supercomputer capabilities, and the scale needed to reach the next exa‑FLOPS era.

CPUFLOPSGPU

0 likes · 7 min read

Understanding High‑Performance Computing: Principles, FLOPS, and Future Limits

Architects' Tech Alliance

Oct 15, 2025 · Fundamentals

Comparative Analysis of Leading E‑Level HPC Processors: A64FX, H100, MI250X, and PonteVecchio

This article compares four cutting‑edge high‑performance processors—Fujitsu A64FX, NVIDIA H100, AMD MI250X, and Intel PonteVecchio—examining their architectures, parallelism strategies, domain‑specific accelerators, supported data types, performance metrics, and power consumption to inform future E‑level computing designs.

AMD MI250XE-level computingFujitsu A64FX

0 likes · 10 min read

Comparative Analysis of Leading E‑Level HPC Processors: A64FX, H100, MI250X, and PonteVecchio

Programmer DD

Oct 12, 2025 · Backend Development

Boost Java Performance: Integrate CUDA GPU Acceleration via JNI

This guide explains why Java struggles with high‑performance or data‑intensive workloads, introduces GPU acceleration with CUDA, compares integration options such as JCuda, JNI, and JNA, walks through a practical encryption use case with performance benchmarks, and provides production‑grade best practices for memory, threading, testing, security, and deployment.

CUDAGPUHigh-performance computing

0 likes · 23 min read

Boost Java Performance: Integrate CUDA GPU Acceleration via JNI

Architects' Tech Alliance

Oct 11, 2025 · Artificial Intelligence

Why NVLink Beats PCIe for AI: Deep Dive into GPU Interconnect Technologies

This article examines the architectural differences between Scale‑Out and Scale‑Up networking, compares PCIe, NVLink, UALink, Infiniband and RoCE, and explains why high‑bandwidth, low‑latency GPU interconnects like NVLink are essential for modern AI and HPC workloads.

AI accelerationGPU interconnectHigh-performance computing

0 likes · 27 min read

Why NVLink Beats PCIe for AI: Deep Dive into GPU Interconnect Technologies

Architects' Tech Alliance

Oct 9, 2025 · Artificial Intelligence

Unlocking AI Scale‑Up: Inside SUE, OISA, ALS and ETH+ High‑Performance Interconnects

This article introduces four cutting‑edge AI networking technologies—SUE, OISA, ALS, and ETH+—detailing their backgrounds, architectural designs, and performance enhancements that enable ultra‑high bandwidth, low‑latency, and scalable interconnects for modern AI compute clusters.

AI networkingEthernetHigh-performance computing

0 likes · 13 min read

Unlocking AI Scale‑Up: Inside SUE, OISA, ALS and ETH+ High‑Performance Interconnects

Architects' Tech Alliance

Sep 29, 2025 · Artificial Intelligence

How NVLink and NVSwitch Power AI’s Next‑Gen High‑Performance Networks

This article, part of the 2025 AI Network Technology Whitepaper, classifies AI high‑performance networking into Scale‑Up, Scale‑Out, and frontier breakthroughs, then dives deep into NVLink’s evolution, technical features, NVSwitch’s full‑mesh architecture, and the newly opened NVLink Fusion ecosystem.

AI networkingGPU interconnectHigh-performance computing

0 likes · 8 min read

How NVLink and NVSwitch Power AI’s Next‑Gen High‑Performance Networks

Alibaba Cloud Infrastructure

Sep 26, 2025 · Artificial Intelligence

How Alibaba’s UPN512 Redefines AI Scale‑Up Networking with Optical Interconnects

The UPN512 whitepaper details Alibaba Cloud's next‑generation AI infrastructure network, explaining the shift from dense to MoE models, the rise of train‑and‑inference integration, xPU scale‑up challenges, and how high‑radix Ethernet with LPO/NPO optical interconnects delivers ultra‑high bandwidth, low latency, cost‑effective, and reliable large‑scale AI compute clusters.

AI InfrastructureHigh-performance computingUPN512

0 likes · 34 min read

How Alibaba’s UPN512 Redefines AI Scale‑Up Networking with Optical Interconnects

Architects' Tech Alliance

Sep 22, 2025 · Artificial Intelligence

How Huawei’s New Atlas Supernodes Redefine AI Compute Power

Huawei’s 2025 Full‑Connection Conference unveiled the Atlas 950 and Atlas 960 SuperPoD supernodes, detailing their massive card counts, unprecedented compute, memory and bandwidth capabilities, and explaining how their full‑stack hardware‑software design dramatically accelerates large‑model AI training and inference.

AI SupernodeAtlas 950Atlas 960

0 likes · 8 min read

How Huawei’s New Atlas Supernodes Redefine AI Compute Power

Architects' Tech Alliance

Sep 14, 2025 · Artificial Intelligence

Why Nvidia’s Blackwell GPUs Are Redefining AI Performance

The article analyzes Nvidia's 2023 Blackwell GPU series and GB200 NVL72 architecture, detailing their advanced 3‑4nm manufacturing, redesigned CUDA cores, next‑gen ray‑tracing and DLSS upgrades, massive compute and memory bandwidth gains, NVLink Gen5 improvements, and the diverse GB200 product configurations for high‑performance AI workloads.

AI accelerationBlackwell GPUGPU architecture

0 likes · 7 min read

Why Nvidia’s Blackwell GPUs Are Redefining AI Performance

Architects' Tech Alliance

Aug 15, 2025 · Artificial Intelligence

How AI Compute Centers Structure Their Networks for Maximum Performance

This article explains the logical and physical architecture of AI compute centers, detailing the division into access, security, network, management, out‑of‑band, AI compute cluster, and general compute zones, and describes the four network planes—parameter, sample, business, and management—required for high‑performance AI workloads.

AICompute clusterHigh-performance computing

0 likes · 7 min read

How AI Compute Centers Structure Their Networks for Maximum Performance

Architects' Tech Alliance

Jul 23, 2025 · Artificial Intelligence

Why Do AI Large‑Model Training Clusters Need Specialized Network Topologies?

The article explains how AI large‑model training demands massive GPU resources and how carefully designed network architectures—such as Clos/Fat‑Tree, Spine‑Leaf, multi‑rail versus single‑rail connections, Dragonfly, and Torus—impact performance, scalability, cost, and reliability, guiding the selection of optimal data‑center networks.

AIData CenterGPU clusters

0 likes · 9 min read

Why Do AI Large‑Model Training Clusters Need Specialized Network Topologies?

AntTech

Jul 18, 2025 · Artificial Intelligence

Explore the 2025 CCF‑Ant Research Fund: 50 Cutting‑Edge Projects in AI, Security & Computing

The CCF‑Ant Research Fund 2025, now open for its first batch, invites global university and institute researchers to apply by August 25 2025 for up to 50 projects spanning data security, hardware‑software co‑design, supercomputing, and artificial intelligence, with detailed topics, eligibility rules, and submission channels provided.

Data SecurityHigh-performance computingResearch Funding

0 likes · 11 min read

Explore the 2025 CCF‑Ant Research Fund: 50 Cutting‑Edge Projects in AI, Security & Computing

Architects' Tech Alliance

Jul 7, 2025 · Operations

Choosing the Right AI Data Center Network: InfiniBand vs RoCE

This article outlines the high‑performance networking requirements for AI data center training, compares InfiniBand and RoCE solutions, discusses their advantages in bandwidth, latency, scalability and cost, and provides design guidelines for building scalable, low‑latency, non‑blocking AI‑centric network architectures.

AIData CenterHigh-performance computing

0 likes · 10 min read

Choosing the Right AI Data Center Network: InfiniBand vs RoCE

Architects' Tech Alliance

Jul 3, 2025 · Fundamentals

How Supercomputers Evolved: From Early SGI Systems to China’s Exascale Machines

This article traces the evolution of global supercomputing—from early US and Japanese initiatives to Europe’s coordinated investments—and details China’s rapid development of successive supercomputer generations, highlighting landmark systems such as SGI Power Challenge XL, Dawning‑2000, DeepComp series, the “元” platform and the “东方” machine, as well as homegrown high‑performance software like HPSEPS and HPLES.

ChinaHardwareHigh-performance computing

0 likes · 14 min read

How Supercomputers Evolved: From Early SGI Systems to China’s Exascale Machines

php Courses

Jul 2, 2025 · Game Development

Why C++ Dominates Game Development, Systems, and High‑Performance Computing

From powering cutting‑edge 3A games and operating system kernels to accelerating scientific simulations, high‑frequency trading, and embedded IoT devices, C++ remains the go‑to language for high‑performance, low‑level control across diverse domains, thanks to its speed, portability, and fine‑grained memory management.

C++FinTechGame Development

0 likes · 6 min read

Why C++ Dominates Game Development, Systems, and High‑Performance Computing

Architects' Tech Alliance

Jun 29, 2025 · Artificial Intelligence

Scale-Up vs Scale-Out: Balancing Performance and Flexibility in AI Infrastructure

This article explains the technical definitions, core differences, and practical use cases of Scale‑Up and Scale‑Out networking in AI systems, highlighting how they impact latency, bandwidth, and cost, and illustrates their combined application through NVIDIA's NVL72 supernode case study.

AI InfrastructureGPU networkingHigh-performance computing

0 likes · 14 min read

Scale-Up vs Scale-Out: Balancing Performance and Flexibility in AI Infrastructure

Architects' Tech Alliance

Jun 13, 2025 · Artificial Intelligence

How Huawei’s CloudMatrix 384 Challenges Nvidia’s AI Supercomputers

Huawei’s CloudMatrix 384, built from 384 Ascend 910C chips and a multi‑to‑multi topology, delivers up to 300 PFLOP BF16 performance—nearly twice that of Nvidia’s GB200 NVL72—while exposing supply‑chain dependencies on foreign fabs, higher power consumption, and a rapid push to scale China’s domestic semiconductor capabilities.

AI acceleratorAscend 910CCloudMatrix 384

0 likes · 12 min read

How Huawei’s CloudMatrix 384 Challenges Nvidia’s AI Supercomputers

Architects' Tech Alliance

Jun 10, 2025 · Fundamentals

Why RDMA Is Revolutionizing High‑Performance Computing and AI

This article explores how Remote Direct Memory Access (RDMA) technology transforms high‑performance computing, artificial intelligence, and cloud storage by eliminating data copies, bypassing the kernel, and offloading protocols to hardware, while reviewing key metrics, product ecosystems, real‑world use cases, challenges, and future trends.

DPUData Center NetworkingHigh-performance computing

0 likes · 11 min read

Why RDMA Is Revolutionizing High‑Performance Computing and AI

Architects' Tech Alliance

Jun 9, 2025 · Artificial Intelligence

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

In March 2024 Nvidia unveiled the Blackwell GPU family and the GB200 NVL72 architecture, featuring 3‑4 nm processes, redesigned CUDA cores, next‑gen ray‑tracing, upgraded DLSS, massive FP16/FP8 compute gains, 8 TB/s memory bandwidth, and NVLink Gen5, while also presenting complex power, cooling, and packaging challenges for large‑scale AI deployments.

AI accelerationBlackwellGPU

0 likes · 6 min read

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

Network Intelligence Research Center (NIRC)

Jun 9, 2025 · Artificial Intelligence

How to Build High‑Performance GEMM with NVIDIA CUTLASS

The article explains why standard GEMM libraries may fall short for special matrix shapes, introduces NVIDIA’s open‑source CUTLASS library, details its hierarchical tiling architecture, and walks through a complete device‑API example that customizes tile sizes and data layouts to achieve near‑hand‑written kernel performance on modern GPUs.

CUDACutlassGEMM

0 likes · 6 min read

How to Build High‑Performance GEMM with NVIDIA CUTLASS

Volcano Engine Developer Services

Jun 4, 2025 · Cloud Computing

Unlock Cloud‑Level RDMA Performance with Volcengine’s vRDMA

Volcengine’s vRDMA brings high‑performance, low‑latency RDMA acceleration to cloud VPCs, combining self‑developed congestion control, elastic ENI integration, and compatibility with HPC, AI, and big‑data workloads to deliver up to 320 Gbps bandwidth and microsecond‑level latency.

AICloud NetworkingDistributed storage

0 likes · 10 min read

Unlock Cloud‑Level RDMA Performance with Volcengine’s vRDMA

Architects' Tech Alliance

Jun 3, 2025 · Artificial Intelligence

Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage

The article provides an in‑depth technical overview of Remote Direct Memory Access (RDMA), covering its zero‑copy, kernel‑bypass, and protocol‑offload features, hardware and software ecosystems, and its impact on high‑performance computing, artificial intelligence, cloud storage, finance, and edge computing.

High-performance computingNetwork ProtocolsRDMA

0 likes · 10 min read

Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage

Architects' Tech Alliance

May 25, 2025 · Fundamentals

Comprehensive Overview of Shenwei (申威) Chip Development, Technology, Roadmap, and Applications

This article provides an in‑depth overview of Shenwei chips, covering their development history, core technical advantages such as a self‑designed instruction set and high‑performance computing capabilities, the current product line‑up, and their applications in supercomputing, cloud data centers, security, and embedded systems.

CPU architectureHigh-performance computingServer processors

0 likes · 13 min read

Comprehensive Overview of Shenwei (申威) Chip Development, Technology, Roadmap, and Applications

Baidu Intelligent Cloud Tech Hub

May 23, 2025 · Artificial Intelligence

How Baidu’s Kunlun Supernode Redefines AI Compute Density and Performance

This article explains how Baidu’s Kunlun supernode, built on high‑density liquid‑cooled cabinets and a modular 1U 4‑card design, breaks traditional 8‑card limits, boosts compute density four‑fold, improves power and cooling efficiency, and provides a scalable foundation for large‑model AI training and inference.

AI InfrastructureGPU ClusterHigh-performance computing

0 likes · 13 min read

How Baidu’s Kunlun Supernode Redefines AI Compute Density and Performance

Architects' Tech Alliance

May 15, 2025 · Industry Insights

Why InfiniBand Still Beats Ethernet: Deep Dive into RDMA, Omni‑Path, and Protocol Layers

This article provides a comprehensive technical analysis of InfiniBand architecture, its protocol stack, comparison with Ethernet‑based RDMA solutions like RoCE and iWARP, and an overview of Omni‑Path, highlighting performance advantages, design trade‑offs, and practical limitations.

High-performance computingInfiniBandNetwork Architecture

0 likes · 19 min read

Why InfiniBand Still Beats Ethernet: Deep Dive into RDMA, Omni‑Path, and Protocol Layers

Baidu Geek Talk

May 14, 2025 · Industry Insights

How RapidFS Boosts AI Model Training with 10 TiB/s Throughput

The article explains how large‑scale AI model training and inference require massive data handling, describes the RapidFS storage acceleration cluster deployed on a 30,000‑card Kunlun chip system with hundreds of domestic CPU servers, and presents performance tests showing linear throughput scaling up to over 1 TiB/s, demonstrating the impact of high‑performance storage on compute efficiency.

AI trainingHigh-performance computingRapidFS

0 likes · 5 min read

How RapidFS Boosts AI Model Training with 10 TiB/s Throughput

Architects' Tech Alliance

May 8, 2025 · Industry Insights

How AI Storage Is Redefining Data‑Compute Synergy: Trends, Tech, and Roadmap

This article analyses the emergence of AI‑focused storage, detailing its ultra‑high bandwidth, concurrency, scale and low‑latency characteristics, the architectural shift from layered to fused designs, the specific performance and data‑management demands of training and inference, and a three‑phase roadmap for future storage innovations.

AI storageGPU AccelerationHigh-performance computing

0 likes · 12 min read

How AI Storage Is Redefining Data‑Compute Synergy: Trends, Tech, and Roadmap

Architects' Tech Alliance

Apr 26, 2025 · Industry Insights

Can Huawei’s CloudMatrix 384 Outpace Nvidia’s GB200? A Deep Dive into China’s AI Supernode

The article provides a detailed technical analysis of Huawei's CloudMatrix 384 AI supernode—its 384 Ascend 910C chips, 300 PFLOP BF16 performance, massive memory and bandwidth, power consumption, scale‑up and scale‑out optical networking, and how it compares to Nvidia's GB200 NVL72 in architecture, cost, and energy efficiency.

AI hardwareCloudMatrixGPU Cluster

0 likes · 12 min read

Can Huawei’s CloudMatrix 384 Outpace Nvidia’s GB200? A Deep Dive into China’s AI Supernode

Baidu Intelligent Cloud Tech Hub

Apr 25, 2025 · Operations

How RapidFS Accelerates AI Model Training with 10 TiB/s Storage Performance

The article explains how RapidFS, a near‑compute storage acceleration solution built on BOS object storage, delivers up to 10 TiB/s throughput for massive AI model training, detailing its architecture, deployment on a 30,000‑card Kunlun cluster, and performance test results that show linear scaling from 20 to 70 nodes.

AI trainingHigh-performance computingRapidFS

0 likes · 6 min read

How RapidFS Accelerates AI Model Training with 10 TiB/s Storage Performance

AntTech

Apr 17, 2025 · Artificial Intelligence

Data+AI Forum at the 18th China Electronics Information Conference (2025) – Speaker Bios and Session Summaries

The 18th China Electronics Information Conference will be held in Chengdu from April 17‑21, 2025, featuring the DATA+AI forum that gathers leading academicians and industry experts to discuss data‑AI integration, with detailed speaker biographies, presentation titles, and abstracts covering topics such as large‑model inference, cloud‑edge ultrasound diagnostics, and the future of databases in the AI era.

@DataAIBig Data

0 likes · 12 min read

Data+AI Forum at the 18th China Electronics Information Conference (2025) – Speaker Bios and Session Summaries

Architects' Tech Alliance

Apr 8, 2025 · Artificial Intelligence

How NVSwitch Revolutionizes Multi‑GPU Interconnect for AI Workloads

This article examines NVIDIA's NVSwitch technology, explaining why it was needed, how it builds on NVLink to overcome PCIe bottlenecks, tracing its evolution from Pascal to the third‑generation design, and detailing its architectural features, scalability, full‑duplex bandwidth, non‑blocking communication, and optimized network topologies for high‑performance AI and HPC systems.

AI hardwareGPU interconnectHigh-performance computing

0 likes · 9 min read

How NVSwitch Revolutionizes Multi‑GPU Interconnect for AI Workloads

Architects' Tech Alliance

Apr 6, 2025 · Fundamentals

PCIe vs NVLink: How Modern GPU Interconnects Power AI Training

As AI models grow to trillion‑parameter scales, training them demands massive GPU clusters whose performance is increasingly limited by network bandwidth; this article examines why traditional PCIe interconnects become bottlenecks and how NVIDIA's NVLink and NVSwitch technologies dramatically improve multi‑GPU communication and overall system efficiency.

AI trainingGPUHigh-performance computing

0 likes · 12 min read

PCIe vs NVLink: How Modern GPU Interconnects Power AI Training

21CTO

Mar 7, 2025 · Artificial Intelligence

Why France Named Its New Supercomputer After a Pioneering Female Engineer

France will christen its upcoming 2025 supercomputer after Alice Recoque, a trailblazing 1970s engineer, highlighting both the nation's high‑performance computing ambitions and a symbolic push for gender diversity in a traditionally male‑dominated field.

Alice RecoqueFrench technologyHigh-performance computing

0 likes · 5 min read

Why France Named Its New Supercomputer After a Pioneering Female Engineer

DeWu Technology

Feb 26, 2025 · Backend Development

Migrating to Rust: A Case Study in High-Performance Computing

Migrating a Java computing layer to Rust yielded dramatic performance gains—30% lower CPU usage, 70% less memory—and greater stability, as the authors explain how Rust’s ownership, borrowing, lifetimes, and concurrency, combined with optimized data handling, FFI integration, Tokio async, Docker deployment, and monitoring, outweigh the steep learning curve and ecosystem gaps.

Backend DevelopmentFFIHigh-performance computing

0 likes · 22 min read

Migrating to Rust: A Case Study in High-Performance Computing

Alibaba Cloud Infrastructure

Jan 20, 2025 · Cloud Computing

2024 Alibaba Cloud Infrastructure Network Team: AI‑Scale Network Innovations, Academic Achievements, Open‑Source Contributions and Industry Outreach

The 2024 report of Alibaba Cloud's Infrastructure Network team details AI‑driven network breakthroughs, high‑performance protocol stacks, large‑scale monitoring systems, numerous top‑conference paper acceptances, open‑source ecosystem initiatives, and extensive industry outreach, highlighting the evolving AI infra landscape.

AI InfrastructureConference PapersData Center Networking

0 likes · 19 min read

2024 Alibaba Cloud Infrastructure Network Team: AI‑Scale Network Innovations, Academic Achievements, Open‑Source Contributions and Industry Outreach

Deepin Linux

Dec 25, 2024 · Fundamentals

An Introduction to RDMA: Principles, Programming, and Applications

This article explains RDMA technology, covering its core principles, programming model with Verbs API, various communication modes, and its impact on data‑center networking, high‑performance computing, and distributed storage, highlighting its low‑latency, zero‑copy advantages over traditional TCP/IP.

Data CenterHigh-performance computingNetwork Programming

0 likes · 30 min read

An Introduction to RDMA: Principles, Programming, and Applications

Alibaba Cloud Infrastructure

Dec 19, 2024 · Industry Insights

How China’s First Cloud HPC Standard Is Shaping the Future of High‑Performance Computing

The article explains how the newly approved national cloud‑HPC standard, co‑created by Alibaba Cloud and the China Electronics Standardization Institute, addresses resource limits, reduces costs, and guides industry adoption across sectors such as automotive, semiconductor design, and weather forecasting.

Alibaba CloudChinaCloud Computing

0 likes · 4 min read

How China’s First Cloud HPC Standard Is Shaping the Future of High‑Performance Computing

Architects' Tech Alliance

Dec 14, 2024 · Industry Insights

Inside the High‑Performance GPU Server: A Deep Dive into A100/A800 & H100 Topologies

This article provides a detailed technical analysis of multi‑GPU server architectures, covering component breakdowns, NVSwitch networking, bandwidth calculations, and the differences between NVIDIA A100, A800, and H100 configurations for large‑scale AI workloads.

AI hardwareGPU architectureHigh-performance computing

0 likes · 12 min read

Inside the High‑Performance GPU Server: A Deep Dive into A100/A800 & H100 Topologies

Architects' Tech Alliance

Dec 11, 2024 · Fundamentals

Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained

This article breaks down the core components of high‑performance GPU servers—including PCIe switch chips, the evolution of NVLink from version 1.0 to 4.0, NVSwitch architecture, HBM memory tiers, and the nuances of bandwidth units—providing a comprehensive technical foundation for large‑scale model training.

GPU computingHBMHigh-performance computing

0 likes · 10 min read

Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained

Alibaba Cloud Infrastructure

Nov 12, 2024 · Industry Insights

How Cloud HPC Is Redefining Data+AI: Insights from Alibaba Cloud’s VP

In a keynote at CCF HPC China 2024, Alibaba Cloud’s VP explains how diversified high‑performance computing workloads, elastic cloud resources, and the proprietary CIPU architecture are driving the shift to a data‑plus‑AI era across industries such as automotive, life‑science, and large‑model training.

AICIPUCloud Computing

0 likes · 9 min read

How Cloud HPC Is Redefining Data+AI: Insights from Alibaba Cloud’s VP

Architects' Tech Alliance

Nov 7, 2024 · Industry Insights

Why RDMA, InfiniBand, and RoCE Are Redefining High‑Performance Data Center Networks

This article examines the evolution from the OSI and TCP/IP models to RDMA‑based technologies, compares traditional three‑tier and leaf‑spine architectures, analyzes NVIDIA SuperPOD designs, and evaluates Ethernet, InfiniBand, and RoCE switches to guide high‑throughput, low‑latency data‑center networking decisions.

Data Center NetworkingHigh-performance computingInfiniBand

0 likes · 13 min read

Why RDMA, InfiniBand, and RoCE Are Redefining High‑Performance Data Center Networks

Tencent Advertising Technology

Oct 14, 2024 · Artificial Intelligence

Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising

This paper presents the implementation and practice of generative retrieval based on Yuan large model in Tencent Advertising, addressing three key challenges: user intent capture, model alignment in advertising domain, and high-performance platform design under ROI constraints.

Generative RetrievalHigh-performance computingModel Optimization

0 likes · 17 min read

Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising

Baidu Geek Talk

Oct 9, 2024 · Artificial Intelligence

How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency

This article analyzes Baidu's Baige 4.0 AI infrastructure, detailing its four‑layer architecture, XMAN 5.0 hardware, HPN network, BCCL communication library, and AIAK inference upgrades, and explains how these innovations address large‑model training and inference challenges while boosting performance, utilization, and cost efficiency.

AI InfrastructureGPU AccelerationHigh-performance computing

0 likes · 16 min read

How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency

Architects' Tech Alliance

Sep 2, 2024 · Industry Insights

Why Is the Global HPC Market Set to Surge to $437 Billion by 2028?

The report examines the 2023 global HPC market—covering on‑premise servers, cloud services, storage, compute engines, and interconnect technologies—showing total spending of $297 billion, forecasting growth to $437 billion by 2028, and highlighting key hardware trends, cloud adoption rates, and emerging AI‑driven workloads.

HPCHigh-performance computingMarket Forecast

0 likes · 8 min read

Why Is the Global HPC Market Set to Surge to $437 Billion by 2028?

Architects' Tech Alliance

Aug 29, 2024 · Industry Insights

How NVIDIA Builds 256‑GPU and 576‑GPU SuperPods with H100, GH200, and GB200 Interconnects

The article analyzes NVIDIA's DGX SuperPOD architectures across three GPU generations—H100, GH200, and GB200—detailing their NVLink/NVSwitch topologies, bandwidth calculations, scalability limits, and the practical challenges of constructing 256‑GPU and 576‑GPU supercomputing clusters.

Data CenterGPUHigh-performance computing

0 likes · 11 min read

How NVIDIA Builds 256‑GPU and 576‑GPU SuperPods with H100, GH200, and GB200 Interconnects

ByteDance Data Platform

Aug 27, 2024 · Artificial Intelligence

AI-Driven BI: Achieving Zero-Barrier Data Access and Smart Insights

This article traces the evolution of business intelligence platforms from early report‑centric tools to modern AI‑enhanced, search‑driven solutions, detailing the architectural layers, high‑performance data analysis design, multi‑level aggregation, hot‑cold data tiering, and large‑model applications that enable zero‑threshold data consumption and intelligent insights.

Business IntelligenceHigh-performance computingartificial-intelligence

0 likes · 18 min read

AI-Driven BI: Achieving Zero-Barrier Data Access and Smart Insights

Architects' Tech Alliance

Aug 13, 2024 · Fundamentals

Understanding High Bandwidth Memory (HBM): Architecture, Benefits, and Applications

High Bandwidth Memory (HBM) is a DRAM technology that uses stacked chips, TSV, and micro‑bump interconnects to deliver ultra‑high data rates, lower power consumption, and compact form factor, addressing the bandwidth, latency, power, space, thermal, and complexity challenges of traditional 2D memory in GPUs, AI, HPC, and data‑center workloads.

HBMHigh-performance computingMemory Architecture

0 likes · 10 min read

Understanding High Bandwidth Memory (HBM): Architecture, Benefits, and Applications

Open Source Linux

Jul 23, 2024 · Fundamentals

Why Fat-Tree, Dragonfly, and Torus Topologies Dominate High‑Performance Computing Networks

High‑performance computing demands ultra‑low latency and massive scale, prompting a shift from traditional CLOS designs to alternative topologies such as Fat‑Tree, Dragonfly, and Torus, each offering distinct trade‑offs in bandwidth, scalability, routing complexity, and cost‑effectiveness for modern data‑center and HPC environments.

DragonflyFat-TreeHigh-performance computing

0 likes · 10 min read

Why Fat-Tree, Dragonfly, and Torus Topologies Dominate High‑Performance Computing Networks

Architects' Tech Alliance

Jul 1, 2024 · Industry Insights

China's Distributed Storage Market: Size, Structure, and Growth Trends

This article defines distributed storage, presents China's market size and growth rates from 2021 to 2022, breaks down the market into file, block, and object segments, highlights the top adopting industries, and outlines nine key application scenarios driving rapid expansion.

Big DataChinaCloud Computing

0 likes · 7 min read

China's Distributed Storage Market: Size, Structure, and Growth Trends

Architects' Tech Alliance

May 19, 2024 · Industry Insights

How to Build a 10,000‑GPU Supercluster: Core Design Principles and Architecture

This article analyzes the challenges and solutions for constructing a super‑large GPU training cluster, outlining five fundamental design principles, a four‑layer plus one‑domain architecture, and practical considerations for hardware, networking, and operational reliability in AI workloads.

AI trainingGPU ClusterHigh-performance computing

0 likes · 8 min read

How to Build a 10,000‑GPU Supercluster: Core Design Principles and Architecture

Architects' Tech Alliance

May 19, 2024 · Industry Insights

InfiniBand vs RoCEv2: Which High‑Performance Network Wins AI Compute?

With AI models growing to billions of parameters, the choice of high‑performance interconnect—InfiniBand or RoCEv2—directly impacts training speed, scalability, latency, and operational complexity, and this article analyzes their architectures, performance metrics, vendor ecosystems, and suitability for large‑scale AI clusters.

AIHigh-performance computingInfiniBand

0 likes · 13 min read

InfiniBand vs RoCEv2: Which High‑Performance Network Wins AI Compute?

Architects' Tech Alliance

May 16, 2024 · Industry Insights

How to Build a Multi‑Petabyte AI Super‑Cluster: Scaling Beyond Ten‑Thousand GPUs

This article analyzes the architectural upgrades required for ultra‑large AI clusters, covering single‑GPU performance, super‑node scaling, DPU‑based heterogeneous computing, power‑efficiency, high‑throughput storage, and robust high‑speed networking to support trillion‑parameter model training and inference.

AIDPUGPU Cluster

0 likes · 17 min read

How to Build a Multi‑Petabyte AI Super‑Cluster: Scaling Beyond Ten‑Thousand GPUs

Architects' Tech Alliance

May 15, 2024 · Artificial Intelligence

Detailed Overview of GPU Server Architectures: A100/A800 and H100/H800 Nodes

This article provides a comprehensive technical overview of large‑scale GPU server architectures, detailing the component topology of 8‑GPU A100/A800 and H100/H800 nodes, explaining storage network cards, NVSwitch interconnects, bandwidth calculations, and the trade‑offs between RoCEv2 and InfiniBand for AI workloads.

GPUHigh-performance computingNVLink

0 likes · 13 min read

Detailed Overview of GPU Server Architectures: A100/A800 and H100/H800 Nodes

21CTO

May 15, 2024 · Fundamentals

Why China Is Quietly Withdrawing from the Top500 Supercomputer Race

The latest Top500 ranking shows the United States dominating with the two fastest supercomputers, while China, despite having powerful hundred‑exaflop systems, has stopped reporting its machines, reflecting a strategic shift amid the tech cold war between the two nations.

ChinaHigh-performance computingTOP500

0 likes · 3 min read

Why China Is Quietly Withdrawing from the Top500 Supercomputer Race

Architects' Tech Alliance

May 14, 2024 · Fundamentals

Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM

This article provides a comprehensive overview of the core components and terminology of large‑scale GPU computing, covering GPU server architecture, PCIe interconnects, NVLink generations, NVSwitch, high‑bandwidth memory (HBM), and bandwidth unit considerations for AI and HPC workloads.

AI hardwareGPU computingHBM

0 likes · 11 min read

Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM

Architects' Tech Alliance

Apr 28, 2024 · Industry Insights

Why RoCE v2 Is Outpacing InfiniBand for Modern Data Centers

This article provides an in‑depth technical analysis of RoCE v2, covering its architecture, NIC requirements, and detailed comparisons with InfiniBand across physical layers, protocol stacks, switching, congestion handling, routing, and topology, while also highlighting the UEC alliance’s new transport protocol initiative.

High-performance computingInfiniBandRDMA

0 likes · 12 min read

Why RoCE v2 Is Outpacing InfiniBand for Modern Data Centers

Architects' Tech Alliance

Apr 16, 2024 · Industry Insights

Inside AI Servers: PCIe, NVLink, and NVSwitch Driving the Next‑Gen Compute

Based on TrendForce data, AI server shipments are projected to grow at a 12.2% CAGR through 2027, while advances in PCIe switching, retiming chips, and high‑speed GPU interconnects such as NVLink and NVSwitch are reshaping the architecture and performance of next‑generation AI compute platforms.

AI serversGPU interconnectHigh-performance computing

0 likes · 11 min read

Inside AI Servers: PCIe, NVLink, and NVSwitch Driving the Next‑Gen Compute

Architects' Tech Alliance

Apr 10, 2024 · Industry Insights

Inside the GPU Server: Architecture of A100/A800 and H100/H800 Nodes

This article provides a detailed technical breakdown of modern multi‑GPU server nodes, covering component composition, storage network cards, NVSwitch interconnects, bandwidth calculations, and the architectural differences between NVIDIA A100/A800 and H100/H800 configurations for AI training workloads.

A100AI trainingGPU

0 likes · 12 min read

Inside the GPU Server: Architecture of A100/A800 and H100/H800 Nodes

Architects' Tech Alliance

Apr 8, 2024 · Fundamentals

Unlocking GPU Server Architecture: PCIe, NVLink, NVSwitch & HBM Explained

This article provides a comprehensive breakdown of high‑performance GPU server infrastructure, covering PCIe generations, NVLink evolution, NVSwitch and NVLink switches, HBM memory technologies, and bandwidth measurement units, helping readers understand the hardware connections and performance considerations essential for large‑scale model training.

GPU architectureHBMHigh-performance computing

0 likes · 10 min read

Unlocking GPU Server Architecture: PCIe, NVLink, NVSwitch & HBM Explained

Architects' Tech Alliance

Mar 25, 2024 · Industry Insights

Why Fat-Tree, Dragonfly, and Torus Topologies Matter in HPC Networks

The article examines the challenges of ultra‑large‑scale HPC networking, compares traditional CLOS with Fat‑Tree, Dragonfly, and Torus topologies, explains their bandwidth and latency characteristics, presents scalability formulas, and evaluates routing algorithms and practical trade‑offs for each design.

Data CenterDragonflyHigh-performance computing

0 likes · 14 min read

Why Fat-Tree, Dragonfly, and Torus Topologies Matter in HPC Networks

Architects' Tech Alliance

Feb 29, 2024 · Industry Insights

Choosing the Right GPU Cluster Network: NVLink, InfiniBand, RoCE & DDC Explained

This article examines the key GPU/TPU cluster networking options—NVLink, InfiniBand, RoCE Ethernet, and emerging DDC full‑scheduling fabrics—detailing their latency, loss‑less transmission, congestion control, cost, power, and scalability considerations for large‑scale AI training deployments.

AI trainingDDC fabricGPU networking

0 likes · 18 min read

Choosing the Right GPU Cluster Network: NVLink, InfiniBand, RoCE & DDC Explained

DataFunTalk

Feb 18, 2024 · Cloud Computing

Research on the Unified Storage Platform for the Supercomputing Internet

This article presents a comprehensive overview of the challenges, key technologies, and future applications of a unified storage platform built on Alluxio for China's national supercomputing internet, detailing its architecture, data flow strategies, deployment status, and industry use cases across multiple sectors.

AlluxioCloud ComputingData Flow

0 likes · 13 min read

Research on the Unified Storage Platform for the Supercomputing Internet

Architects' Tech Alliance

Feb 14, 2024 · Industry Insights

Why InfiniBand Is Outpacing Ethernet in High‑Performance Computing

This article provides a comprehensive overview of InfiniBand technology, covering its history, architecture, packet format, layer functions, switching mechanisms, and performance advantages over Ethernet, while highlighting its rapid growth and future prospects in HPC environments.

ComparisonHigh-performance computingInfiniBand

0 likes · 15 min read

Why InfiniBand Is Outpacing Ethernet in High‑Performance Computing

NetEase Cloud Music Tech Team

Jan 25, 2024 · Backend Development

Cloud Music RTA Advertising and User Acquisition System: Architecture and Optimization Practices

NetEase Cloud Music’s RTA advertising system delivers real‑time, personalized ads at massive scale by using isolated Nginx clusters, layered decoupling, asynchronous Netty/Redis processing, and optimized storage with hash‑based key compression and Protostuff serialization, while supporting automated audience selection and in‑app attribution to boost user acquisition.

High-performance computingRTA advertisingadvertising technology

0 likes · 12 min read

Cloud Music RTA Advertising and User Acquisition System: Architecture and Optimization Practices

Architects' Tech Alliance

Sep 21, 2023 · Fundamentals

Overview of High‑Performance Computing (HPC) Market and Trends

The article defines high‑performance computing, presents China's dominant share in the Top500 supercomputer rankings, details recent global and Chinese HPC market growth rates, forecasts market size to 2026, and cites multiple industry reports as sources.

CAGRChinaHPC

0 likes · 4 min read

Overview of High‑Performance Computing (HPC) Market and Trends

21CTO

Sep 18, 2023 · Operations

China’s 1.5 Exaflops Oceanlite Supercomputer Chases the Gordon Bell Prize

The ACM announced that a paper based on China’s 1.5 exaflops Oceanlite supercomputer has been shortlisted for the 2023 Gordon Bell Prize, highlighting its novel turbulent‑flow code, the SW26010 Pro processor architecture, other global contenders, and geopolitical implications voiced by Jack Dongarra.

ExascaleGordon Bell PrizeHigh-performance computing

0 likes · 15 min read

China’s 1.5 Exaflops Oceanlite Supercomputer Chases the Gordon Bell Prize

Architects' Tech Alliance

Jun 29, 2023 · Artificial Intelligence

Hyperion Research ISC23 HPC Market Update: Trends, Forecasts, and AI Impact

The Hyperion Research ISC23 HPC Market Update briefing highlights modest 2022‑2023 growth, forecasts global HPC spending reaching $33 billion in 2023 and $52 billion by 2026, outlines ten key 2023 predictions—including AI regulation, cloud‑driven HPC expansion, and emerging DPU/IPU markets—while emphasizing the continuing talent shortage and the strategic importance of AI across high‑performance computing.

AICloud ComputingHPC

0 likes · 7 min read

Hyperion Research ISC23 HPC Market Update: Trends, Forecasts, and AI Impact

Alibaba Cloud Infrastructure

Jun 16, 2023 · Cloud Computing

Predictable Network and High‑Performance Network Architecture for Large‑Scale AI Training

The article examines how Alibaba Cloud’s Predictable Network, InfiniBand versus Ethernet trade‑offs, and the HPN high‑performance network design together address the extreme bandwidth, latency, scalability and reliability requirements of modern large‑model AI training workloads in cloud data centers.

AI trainingCloud ComputingEthernet

0 likes · 24 min read

Predictable Network and High‑Performance Network Architecture for Large‑Scale AI Training

Open Source Linux

Jun 13, 2023 · Fundamentals

Why RDMA Outperforms Traditional Networking: A Deep Dive into DMA

This article explains the fundamentals of Direct Memory Access (DMA) and Remote Direct Memory Access (RDMA), compares their data transfer mechanisms with traditional networking, and outlines RDMA's advantages, protocols, ecosystem, and real‑world adoption in high‑performance computing and data centers.

DMAHardwareHigh-performance computing

0 likes · 13 min read

Why RDMA Outperforms Traditional Networking: A Deep Dive into DMA

Efficient Ops

Jun 11, 2023 · Artificial Intelligence

Why Network Bandwidth Is the Real Bottleneck for AIGC and How DDC Solves It

The article explains how AIGC models demand massive GPU compute, why network bandwidth and latency become the critical limiting factors, and how the Distributed Disaggregated Chassis (DDC) architecture addresses these challenges with scalable, high‑throughput networking solutions.

AI InfrastructureAIGCDDC

0 likes · 13 min read

Why Network Bandwidth Is the Real Bottleneck for AIGC and How DDC Solves It

Architects' Tech Alliance

Jun 10, 2023 · Fundamentals

Understanding RDMA: How Direct Memory Access Boosts Data Center Performance

This article explains the principles of DMA and RDMA, compares RDMA protocols such as InfiniBand, RoCE, and iWARP, outlines their performance advantages, and reviews the key standards bodies, open‑source communities, hardware vendors, and real‑world adoption in high‑performance data centers.

DMAData CenterHigh-performance computing

0 likes · 15 min read

Understanding RDMA: How Direct Memory Access Boosts Data Center Performance

Baidu Intelligent Cloud Tech Hub

May 19, 2023 · Cloud Computing

How DPU‑Powered Cloud IaaS Revolutionizes Compute, Networking, and Storage

Baidu Intelligent Cloud’s 2023 GTC presentation details how its DPU‑based IaaS architecture unifies high‑performance compute, networking, storage, and security, addressing rapid AI workload growth, reducing CPU bottlenecks, and delivering elastic, cost‑effective solutions across virtual machines, bare‑metal servers, and specialized RDMA instances.

DPUHigh-performance computingIaaS

0 likes · 17 min read

How DPU‑Powered Cloud IaaS Revolutionizes Compute, Networking, and Storage

Baidu Tech Salon

May 11, 2023 · Artificial Intelligence

Inside Baidu’s High‑Performance GPU Cluster: Powering the Next‑Gen AI Models

The article details Baidu's development of a massive high‑performance GPU/IB cluster, its architectural design, the challenges of training trillion‑parameter models, and how the integrated AI stack—spanning hardware, framework, and resource management—overcomes compute, memory, and communication bottlenecks to accelerate large‑model training.

AI InfrastructureBaidu AI BaseGPU Cluster

0 likes · 17 min read

Inside Baidu’s High‑Performance GPU Cluster: Powering the Next‑Gen AI Models

Architects' Tech Alliance

Apr 17, 2023 · Fundamentals

Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models

This article provides a comprehensive overview of high‑performance computing, covering system architectures, hardware components, performance metrics, network topologies, common parallel file systems, cluster management functions, mainstream job‑scheduling systems, and MPI‑based parallel programming models.

HPCHigh-performance computingJob Scheduling

0 likes · 14 min read

Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models

Tencent Cloud Developer

Apr 14, 2023 · Artificial Intelligence

Tencent Cloud's Next-Generation HCC High-Performance Computing Cluster for Large Model Training

Tencent Cloud's new HCC high‑performance computing cluster triples previous generation performance with 3.2 TB/s server bandwidth, Xingsha servers and NVIDIA H800 GPUs delivering up to 1979 TFlops, while its Xingmai 3.2 T ETH RDMA network, TB‑level storage via COS + GooseFS, and multi‑form access (bare metal, cloud servers, containers, functions) enable efficient large‑model training.

AI computingGPU ClusterHigh-performance computing

0 likes · 9 min read

Tencent Cloud's Next-Generation HCC High-Performance Computing Cluster for Large Model Training

Open Source Linux

Apr 14, 2023 · Fundamentals

Why InfiniBand Is the Fastest Growing High‑Speed Interconnect for HPC

This article provides a comprehensive overview of InfiniBand technology, covering its history, architecture, packet structure, layer hierarchy, switching mechanisms, and performance advantages over Ethernet, highlighting its role as a low‑latency, high‑bandwidth solution for high‑performance computing.

High-performance computingInfiniBandNetwork Architecture

0 likes · 14 min read

Why InfiniBand Is the Fastest Growing High‑Speed Interconnect for HPC

Alibaba Cloud Big Data AI Platform

Mar 27, 2023 · Artificial Intelligence

How uGrapher Boosts GNN Performance 3.5× with a Unified Graph Operator Abstraction

Alibaba Cloud's PAI platform and Shanghai Jiao Tong University’s team announced their ASPLOS 2023‑accepted paper uGrapher, which unifies graph operator computation for GNNs, achieving up to 3.5× speedup over existing frameworks and paving the way for industrial‑scale acceleration.

ASPLOS 2023Alibaba Cloud PAIGraph Neural Networks

0 likes · 4 min read

How uGrapher Boosts GNN Performance 3.5× with a Unified Graph Operator Abstraction

Architects' Tech Alliance

Mar 26, 2023 · Fundamentals

Comprehensive Overview of InfiniBand Technology and Architecture

This article provides an in‑depth examination of InfiniBand, covering its rapid development as a high‑bandwidth, low‑latency interconnect technology, the InfiniBand Trade Association, detailed packet structures, layered architecture, switching mechanisms, and a comparative analysis with Ethernet, highlighting its advantages for high‑performance computing.

Data TransferHPCHigh-performance computing

0 likes · 14 min read

Comprehensive Overview of InfiniBand Technology and Architecture

Architects' Tech Alliance

Mar 22, 2023 · Fundamentals

Overview of Huawei Kunpeng 920 Processor Architecture and Subsystems

The article provides a detailed technical overview of Huawei's Kunpeng 920 processor, describing its ARM‑based RISC architecture, chip organization, core and cache hierarchy, security features, IMU management, and the design of its I/O, interrupt, network, SAS, and PCIe subsystems.

ArmHigh-performance computingKunpeng

0 likes · 10 min read

Overview of Huawei Kunpeng 920 Processor Architecture and Subsystems

Python Programming Learning Circle

Dec 17, 2022 · Fundamentals

Accelerating Python Code with Taichi: Prime Counting, LCS, and Reaction‑Diffusion Examples

This article demonstrates how importing the Taichi library into Python can dramatically accelerate compute‑intensive tasks, showcasing prime counting, longest common subsequence, and reaction‑diffusion simulations with speedups up to 120× and GPU support, while providing installation and usage guidance.

GPUHigh-performance computingPython

0 likes · 6 min read

Accelerating Python Code with Taichi: Prime Counting, LCS, and Reaction‑Diffusion Examples

21CTO

Dec 11, 2022 · Fundamentals

How Jack Dongarra’s Linpack Revolutionized Supercomputing and Earned a Turing Award

Jack Dongarra, a pioneering computer scientist, created the Linpack library and benchmark that enabled software to scale from laptops to exaflop supercomputers, earning him the 2022 ACM A.M. Turing Award and shaping modern high‑performance and cloud computing.

High-performance computingJack DongarraLinpack

0 likes · 11 min read

How Jack Dongarra’s Linpack Revolutionized Supercomputing and Earned a Turing Award

Alimama Tech

Nov 2, 2022 · Artificial Intelligence

Optimizing GPU Utilization for Multimedia AI Services with high_service

The article presents high_service, a high‑performance inference framework that boosts GPU utilization in multimedia AI services by separating CPU‑heavy preprocessing from GPU inference, employing priority‑based auto‑scaling, multi‑tenant sharing, and TensorRT‑accelerated models to eliminate GIL bottlenecks, reduce waste, and adapt to fluctuating traffic, with future work targeting automated bottleneck detection and further CPU‑GPU offloading.

Auto ScalingGPU UtilizationHigh-performance computing

0 likes · 19 min read

Optimizing GPU Utilization for Multimedia AI Services with high_service