Tag

high‑performance computing

1 views collected around this technical thread.

Architects' Tech Alliance
Architects' Tech Alliance
Jun 13, 2025 · Artificial Intelligence

How Huawei’s CloudMatrix 384 Challenges Nvidia’s AI Supercomputers

Huawei’s CloudMatrix 384, built from 384 Ascend 910C chips and a multi‑to‑multi topology, delivers up to 300 PFLOP BF16 performance—nearly twice that of Nvidia’s GB200 NVL72—while exposing supply‑chain dependencies on foreign fabs, higher power consumption, and a rapid push to scale China’s domestic semiconductor capabilities.

AI acceleratorAscend 910CCloudMatrix 384
0 likes · 12 min read
How Huawei’s CloudMatrix 384 Challenges Nvidia’s AI Supercomputers
Architects' Tech Alliance
Architects' Tech Alliance
Jun 10, 2025 · Fundamentals

Why RDMA Is Revolutionizing High‑Performance Computing and AI

This article explores how Remote Direct Memory Access (RDMA) technology transforms high‑performance computing, artificial intelligence, and cloud storage by eliminating data copies, bypassing the kernel, and offloading protocols to hardware, while reviewing key metrics, product ecosystems, real‑world use cases, challenges, and future trends.

Artificial IntelligenceDPURDMA
0 likes · 11 min read
Why RDMA Is Revolutionizing High‑Performance Computing and AI
Architects' Tech Alliance
Architects' Tech Alliance
Jun 9, 2025 · Artificial Intelligence

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

In March 2024 Nvidia unveiled the Blackwell GPU family and the GB200 NVL72 architecture, featuring 3‑4 nm processes, redesigned CUDA cores, next‑gen ray‑tracing, upgraded DLSS, massive FP16/FP8 compute gains, 8 TB/s memory bandwidth, and NVLink Gen5, while also presenting complex power, cooling, and packaging challenges for large‑scale AI deployments.

AI accelerationBlackwellGPU
0 likes · 6 min read
What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?
Architects' Tech Alliance
Architects' Tech Alliance
Jun 3, 2025 · Artificial Intelligence

Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage

The article provides an in‑depth technical overview of Remote Direct Memory Access (RDMA), covering its zero‑copy, kernel‑bypass, and protocol‑offload features, hardware and software ecosystems, and its impact on high‑performance computing, artificial intelligence, cloud storage, finance, and edge computing.

Artificial IntelligenceHardware AccelerationRDMA
0 likes · 10 min read
Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage
Architects' Tech Alliance
Architects' Tech Alliance
May 26, 2025 · Fundamentals

Understanding RDMA, InfiniBand, and RoCEv2 for High‑Performance Distributed Training

The article explains how distributed AI training performance depends on reducing inter‑card communication latency, introduces RDMA technology and its implementations (InfiniBand, RoCEv2, iWARP), compares their latency and scalability against traditional TCP/IP, and outlines the hardware components and trade‑offs of InfiniBand and RoCEv2 networks.

Distributed TrainingInfiniBandRDMA
0 likes · 12 min read
Understanding RDMA, InfiniBand, and RoCEv2 for High‑Performance Distributed Training
Architects' Tech Alliance
Architects' Tech Alliance
May 25, 2025 · Fundamentals

Comprehensive Overview of Shenwei (申威) Chip Development, Technology, Roadmap, and Applications

This article provides an in‑depth overview of Shenwei chips, covering their development history, core technical advantages such as a self‑designed instruction set and high‑performance computing capabilities, the current product line‑up, and their applications in supercomputing, cloud data centers, security, and embedded systems.

CPU architectureHardware fundamentalsShenwei chip
0 likes · 13 min read
Comprehensive Overview of Shenwei (申威) Chip Development, Technology, Roadmap, and Applications
Architects' Tech Alliance
Architects' Tech Alliance
May 12, 2025 · Artificial Intelligence

Comparison of Fat-Tree, Dragonfly, and Torus Network Topologies for AI and High‑Performance Computing

The article reviews Fat‑Tree, Dragonfly, and Torus network topologies, analyzing their bandwidth, scalability, latency, routing algorithms, and cost trade‑offs for AI‑driven high‑performance computing clusters, and highlights each design's strengths and limitations in large‑scale deployments.

AI computingDragonflyFat Tree
0 likes · 12 min read
Comparison of Fat-Tree, Dragonfly, and Torus Network Topologies for AI and High‑Performance Computing
Architects' Tech Alliance
Architects' Tech Alliance
May 6, 2025 · Artificial Intelligence

Evolution of NVIDIA GPU Architectures for AI from Volta to Blackwell

The article reviews NVIDIA's GPU architecture progression—from Volta's pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell and Rubin designs—highlighting key innovations, performance gains for deep learning, and related resource updates for AI practitioners.

Artificial IntelligenceGPU architectureNvidia
0 likes · 9 min read
Evolution of NVIDIA GPU Architectures for AI from Volta to Blackwell
Architects' Tech Alliance
Architects' Tech Alliance
Apr 28, 2025 · Artificial Intelligence

NVLink High‑Speed Interconnect: Architecture, Evolution, and Performance

NVLink, NVIDIA's high‑bandwidth interconnect introduced with the P100 GPU, replaces PCIe by offering significantly higher data rates and lower latency for GPU‑GPU and GPU‑CPU communication, and has evolved through multiple generations to support modern AI and high‑performance computing workloads.

AI accelerationGPU interconnectNVLink
0 likes · 9 min read
NVLink High‑Speed Interconnect: Architecture, Evolution, and Performance
Architects' Tech Alliance
Architects' Tech Alliance
Apr 21, 2025 · Artificial Intelligence

UALink 1.0: An Open High‑Speed Interconnect Challenging Nvidia’s AI Dominance

The UALink 1.0 specification, driven by AMD, Intel, Broadcom and other industry leaders, introduces an open, low‑latency, high‑bandwidth interconnect that can link up to 1,024 AI accelerators, offering a cost‑effective alternative to Nvidia’s NVLink and reshaping the AI‑HPC market.

AI interconnectData CenterNvidia competition
0 likes · 11 min read
UALink 1.0: An Open High‑Speed Interconnect Challenging Nvidia’s AI Dominance
AntTech
AntTech
Apr 17, 2025 · Artificial Intelligence

Data+AI Forum at the 18th China Electronics Information Conference (2025) – Speaker Bios and Session Summaries

The 18th China Electronics Information Conference will be held in Chengdu from April 17‑21, 2025, featuring the DATA+AI forum that gathers leading academicians and industry experts to discuss data‑AI integration, with detailed speaker biographies, presentation titles, and abstracts covering topics such as large‑model inference, cloud‑edge ultrasound diagnostics, and the future of databases in the AI era.

AIBig DataConference
0 likes · 12 min read
Data+AI Forum at the 18th China Electronics Information Conference (2025) – Speaker Bios and Session Summaries
AntTech
AntTech
Mar 19, 2025 · Artificial Intelligence

Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)

At HPCA 2025, two standout papers—UniNDP, a unified compilation and simulation tool for near‑DRAM processing architectures, and WarpDrive, a GPU‑based fully homomorphic encryption accelerator leveraging Tensor and CUDA cores—demonstrate significant performance gains for AI workloads and privacy‑preserving computation.

AI accelerationFully Homomorphic EncryptionGPU
0 likes · 5 min read
Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)
DeWu Technology
DeWu Technology
Feb 26, 2025 · Backend Development

Migrating to Rust: A Case Study in High-Performance Computing

Migrating a Java computing layer to Rust yielded dramatic performance gains—30% lower CPU usage, 70% less memory—and greater stability, as the authors explain how Rust’s ownership, borrowing, lifetimes, and concurrency, combined with optimized data handling, FFI integration, Tokio async, Docker deployment, and monitoring, outweigh the steep learning curve and ecosystem gaps.

Backend DevelopmentConcurrencyFFI
0 likes · 22 min read
Migrating to Rust: A Case Study in High-Performance Computing
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 20, 2025 · Cloud Computing

2024 Alibaba Cloud Infrastructure Network Team: AI‑Scale Network Innovations, Academic Achievements, Open‑Source Contributions and Industry Outreach

The 2024 report of Alibaba Cloud's Infrastructure Network team details AI‑driven network breakthroughs, high‑performance protocol stacks, large‑scale monitoring systems, numerous top‑conference paper acceptances, open‑source ecosystem initiatives, and extensive industry outreach, highlighting the evolving AI infra landscape.

AI infrastructureConference PapersOpen Source
0 likes · 19 min read
2024 Alibaba Cloud Infrastructure Network Team: AI‑Scale Network Innovations, Academic Achievements, Open‑Source Contributions and Industry Outreach
Architects' Tech Alliance
Architects' Tech Alliance
Jan 5, 2025 · Fundamentals

HadaFS: A New Burst Buffer File System for Scalable High‑Performance Computing

The article presents HadaFS, a novel burst‑buffer‑based distributed file system that combines the scalability of local burst buffers with the data‑sharing advantages of shared buffers, details its LTA architecture, metadata handling, the Hadash management tool, and extensive performance evaluations on the SNS supercomputer.

Burst BufferHPC StorageMetadata Management
0 likes · 18 min read
HadaFS: A New Burst Buffer File System for Scalable High‑Performance Computing
Deepin Linux
Deepin Linux
Dec 25, 2024 · Fundamentals

An Introduction to RDMA: Principles, Programming, and Applications

This article explains RDMA technology, covering its core principles, programming model with Verbs API, various communication modes, and its impact on data‑center networking, high‑performance computing, and distributed storage, highlighting its low‑latency, zero‑copy advantages over traditional TCP/IP.

Data CenterNetwork ProgrammingRDMA
0 likes · 30 min read
An Introduction to RDMA: Principles, Programming, and Applications
AntTech
AntTech
Nov 16, 2024 · Information Security

WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores Accepted at HPCA 2025

Ant Group’s Computing Systems Lab announced that its GPU‑accelerated fully homomorphic encryption framework WarpDrive, which exploits Tensor and CUDA cores for high‑throughput NTT operations and parallel kernel designs, has been accepted as a paper at the IEEE HPCA 2025 conference.

CUDAFully Homomorphic EncryptionGPU
0 likes · 4 min read
WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores Accepted at HPCA 2025
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 25, 2024 · Artificial Intelligence

Highlights of Chinese Enterprises at the 2024 OCP Global Summit: AI Network Architecture, High‑Performance Cooling, and WAN Innovations

The 2024 OCP Global Summit in San Jose showcased Chinese tech leaders like Alibaba Cloud and ByteDance presenting cutting‑edge AI network architectures, liquid‑cooling solutions, SRv6 deployments, high‑performance data‑center designs, and future WAN routing innovations, underscoring China's growing influence in AI infrastructure worldwide.

AI networkingData CenterOCP Summit
0 likes · 8 min read
Highlights of Chinese Enterprises at the 2024 OCP Global Summit: AI Network Architecture, High‑Performance Cooling, and WAN Innovations
Tencent Advertising Technology
Tencent Advertising Technology
Oct 14, 2024 · Artificial Intelligence

Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising

This paper presents the implementation and practice of generative retrieval based on Yuan large model in Tencent Advertising, addressing three key challenges: user intent capture, model alignment in advertising domain, and high-performance platform design under ROI constraints.

Advertising TechnologySemantic Indexinggenerative retrieval
0 likes · 17 min read
Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising