Tagged articles
174 articles
Page 1 of 2
Deepin Linux
Deepin Linux
Mar 6, 2026 · Backend Development

Unlocking Ultra‑Low Latency: How RDMA Transforms High‑Performance Networking

This article explains the fundamentals of Remote Direct Memory Access (RDMA), its low‑latency, zero‑copy and kernel‑bypass mechanisms, programming interfaces, and real‑world applications in data‑center networks, high‑performance computing, and distributed storage, providing developers with practical guidance and code examples.

High‑performance computingLow latencyNetwork programming
0 likes · 31 min read
Unlocking Ultra‑Low Latency: How RDMA Transforms High‑Performance Networking
Deepin Linux
Deepin Linux
Feb 4, 2026 · Fundamentals

How Zero‑Copy and DMA Supercharge Data Transfer Performance

This article explains the fundamentals of zero‑copy, DMA, PageCache and RDMA, compares them with traditional I/O, describes Linux implementations such as sendfile, mmap+write, splice and Java NIO APIs, and shows practical use‑cases that dramatically reduce CPU load and latency in high‑throughput networking and file handling.

DMAJava NIOLinux
0 likes · 40 min read
How Zero‑Copy and DMA Supercharge Data Transfer Performance
Deepin Linux
Deepin Linux
Nov 11, 2025 · Fundamentals

Why RDMA Is the Secret to Lightning‑Fast Data Transfer in Modern Data Centers

This article explains the fundamentals of Remote Direct Memory Access (RDMA), its low‑latency, zero‑copy architecture, core principles, programming interfaces, and how it transforms data‑center networking, high‑performance computing, and distributed storage by bypassing the CPU and kernel.

High‑performance computingKernel BypassNetworking
0 likes · 30 min read
Why RDMA Is the Secret to Lightning‑Fast Data Transfer in Modern Data Centers
Baidu Geek Talk
Baidu Geek Talk
Nov 10, 2025 · Cloud Native

How Polar‑TCP Breaks Kernel Network Bottlenecks for Cloud‑Native High‑Performance Services

This article explains how traditional kernel network stacks struggle with high‑concurrency, low‑latency cloud data‑center workloads and introduces Baidu Intelligent Cloud’s Polar solution—Polar‑TCP and Polar‑RDMA—which combine user‑space DPDK drivers, a lightweight TCP stack, and an industrial RPC framework to achieve near‑RDMA performance while preserving compatibility with existing TCP ecosystems.

DPDKNetwork StackPerformance Optimization
0 likes · 23 min read
How Polar‑TCP Breaks Kernel Network Bottlenecks for Cloud‑Native High‑Performance Services
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Nov 10, 2025 · Cloud Computing

How Polar‑TCP Breaks Kernel Network Bottlenecks for Million‑IOPS Cloud Services

This article explains how traditional kernel network stacks struggle with modern cloud data‑center workloads and introduces Baidu Intelligent Cloud's Polar solution—Polar‑TCP and Polar‑RDMA—which combine user‑space DPDK drivers, a lightweight TCP stack, and an industrial‑grade RPC framework to achieve near‑RDMA performance while preserving ecosystem compatibility.

DPDKHigh‑Performance NetworkingNetwork Stack
0 likes · 24 min read
How Polar‑TCP Breaks Kernel Network Bottlenecks for Million‑IOPS Cloud Services
Architects' Tech Alliance
Architects' Tech Alliance
Oct 12, 2025 · Artificial Intelligence

How InfiniBand Powers AI Training: Deep Dive into RDMA, RoCEv2, and High‑Speed Interconnects

This article explains how InfiniBand’s architecture, native RDMA, GPUDirect, and evolving bandwidth enable ultra‑low‑latency, high‑throughput communication for AI model training, compares it with Ethernet, and details the role of RoCEv2 and other high‑performance interconnect technologies.

AI trainingGPU interconnectHigh‑Performance Networking
0 likes · 9 min read
How InfiniBand Powers AI Training: Deep Dive into RDMA, RoCEv2, and High‑Speed Interconnects
BirdNest Tech Talk
BirdNest Tech Talk
Oct 12, 2025 · Artificial Intelligence

What Happens When a Token Travels Through GPU Villages via RDMA and NVLink?

The article uses a whimsical journey to illustrate how token data is dispatched across GPU clusters—detailing functions like get_dispatch_layout, notify_dispatch, and combine_token, showing RDMA and NVLink pathways, performance experiments, and the final verification of token integrity.

AIDistributed SystemsGPU
0 likes · 5 min read
What Happens When a Token Travels Through GPU Villages via RDMA and NVLink?
Architects' Tech Alliance
Architects' Tech Alliance
Aug 15, 2025 · Artificial Intelligence

How AI Compute Centers Structure Their Networks for Maximum Performance

This article explains the logical and physical architecture of AI compute centers, detailing the division into access, security, network, management, out‑of‑band, AI compute cluster, and general compute zones, and describes the four network planes—parameter, sample, business, and management—required for high‑performance AI workloads.

AICompute clusterHigh‑performance computing
0 likes · 7 min read
How AI Compute Centers Structure Their Networks for Maximum Performance
Architects' Tech Alliance
Architects' Tech Alliance
Jul 29, 2025 · Artificial Intelligence

Why NVIDIA Spectrum‑X and Quantum InfiniBand Are Redefining AI Data Center Networks

The article explains how AI‑driven data center networks must handle massive distributed workloads, why traditional Ethernet falls short, and how NVIDIA’s Spectrum‑X Ethernet and Quantum InfiniBand use loss‑less RDMA, dynamic routing, advanced congestion control, and hardware‑accelerated collective communication to deliver the bandwidth, latency, and scalability required for generative AI and large‑scale model training.

AIInfiniBandNvidia
0 likes · 8 min read
Why NVIDIA Spectrum‑X and Quantum InfiniBand Are Redefining AI Data Center Networks
Architects' Tech Alliance
Architects' Tech Alliance
Jul 24, 2025 · Artificial Intelligence

Inside Huawei’s CloudMatrix384: How a 384‑NPU AI Supernode Achieves Sub‑Microsecond Latency

The article details Huawei’s CloudMatrix384 AI supernode, describing its 384 Ascend 910C NPUs, 192 Kunpeng CPUs, ultra‑high‑bandwidth UB network, three complementary network planes (UB, RDMA, VPC), and the non‑blocking topology that enables sub‑microsecond inter‑node latency across a 16‑rack deployment.

AI hardwareHuaweiRDMA
0 likes · 9 min read
Inside Huawei’s CloudMatrix384: How a 384‑NPU AI Supernode Achieves Sub‑Microsecond Latency
Open Source Tech Hub
Open Source Tech Hub
Jul 19, 2025 · Fundamentals

How Zero‑Copy, DMA, and RDMA Supercharge Data Transfer in Linux

This article explains the performance bottlenecks of traditional I/O, introduces zero‑copy concepts and their relationship with DMA and PageCache, details RDMA architectures, and demonstrates practical zero‑copy implementations such as mmap+write, sendfile, splice, tee, and Java NIO APIs.

DMAJava NIOLinux
0 likes · 41 min read
How Zero‑Copy, DMA, and RDMA Supercharge Data Transfer in Linux
Deepin Linux
Deepin Linux
Jul 19, 2025 · Fundamentals

How Zero‑Copy, DMA, and RDMA Supercharge Data Transfer Performance

This article explains the principles behind zero‑copy, DMA, and RDMA, compares traditional I/O copying with modern zero‑copy techniques, and shows practical implementations in Linux and Java that dramatically reduce CPU overhead and boost network and file‑transfer throughput.

DMALinuxNetworking
0 likes · 42 min read
How Zero‑Copy, DMA, and RDMA Supercharge Data Transfer Performance
Kuaishou Tech
Kuaishou Tech
Jul 17, 2025 · Artificial Intelligence

How DHPS Boosted Online Inference Throughput by 270% with RDMA

This article details the design and evolution of DHPS, Kuaishou's load‑balanced, RDMA‑based high‑performance service architecture, explaining its network, storage, and traffic‑scheduling innovations that deliver over 270% query‑throughput improvement, lower latency, reduced CPU usage, and near‑five‑nine availability for large‑scale AI inference workloads.

Distributed SystemsRDMAStorage Engine
0 likes · 17 min read
How DHPS Boosted Online Inference Throughput by 270% with RDMA
Architects' Tech Alliance
Architects' Tech Alliance
Jul 8, 2025 · Fundamentals

Why Modern Data Center Switches Are the Backbone of AI Scaling

This article explains how data‑center switches are classified, the key components and performance metrics of Ethernet switch chips, market growth trends, the shift from OEO to full‑optical OCS designs, and how RDMA technologies like InfiniBand and RoCEv2 enable the low‑latency networking essential for large‑scale AI training.

AI accelerationData Center NetworkingRDMA
0 likes · 12 min read
Why Modern Data Center Switches Are the Backbone of AI Scaling
Instant Consumer Technology Team
Instant Consumer Technology Team
Jun 30, 2025 · Operations

How 3FS Revolutionizes AI Storage with High‑Throughput Distributed Filesystem

3FS, DeepSeek’s high‑performance parallel file system, is engineered for AI workloads, offering ultra‑low latency, high‑throughput storage via RDMA, CRAQ consistency, and seamless cloud‑native integration, with detailed architecture, deployment steps, performance benchmarks, and cost‑saving strategies for large‑scale model training and inference.

AI storageDistributed File SystemHigh Throughput
0 likes · 28 min read
How 3FS Revolutionizes AI Storage with High‑Throughput Distributed Filesystem
Architects' Tech Alliance
Architects' Tech Alliance
Jun 10, 2025 · Fundamentals

Why RDMA Is Revolutionizing High‑Performance Computing and AI

This article explores how Remote Direct Memory Access (RDMA) technology transforms high‑performance computing, artificial intelligence, and cloud storage by eliminating data copies, bypassing the kernel, and offloading protocols to hardware, while reviewing key metrics, product ecosystems, real‑world use cases, challenges, and future trends.

DPUData Center NetworkingHigh‑performance computing
0 likes · 11 min read
Why RDMA Is Revolutionizing High‑Performance Computing and AI
Architects' Tech Alliance
Architects' Tech Alliance
Jun 3, 2025 · Artificial Intelligence

Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage

The article provides an in‑depth technical overview of Remote Direct Memory Access (RDMA), covering its zero‑copy, kernel‑bypass, and protocol‑offload features, hardware and software ecosystems, and its impact on high‑performance computing, artificial intelligence, cloud storage, finance, and edge computing.

Hardware accelerationHigh‑performance computingNetwork Protocols
0 likes · 10 min read
Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage
Architects' Tech Alliance
Architects' Tech Alliance
May 26, 2025 · Fundamentals

Understanding RDMA, InfiniBand, and RoCEv2 for High‑Performance Distributed Training

The article explains how distributed AI training performance depends on reducing inter‑card communication latency, introduces RDMA technology and its implementations (InfiniBand, RoCEv2, iWARP), compares their latency and scalability against traditional TCP/IP, and outlines the hardware components and trade‑offs of InfiniBand and RoCEv2 networks.

Distributed TrainingInfiniBandRDMA
0 likes · 12 min read
Understanding RDMA, InfiniBand, and RoCEv2 for High‑Performance Distributed Training
Architects' Tech Alliance
Architects' Tech Alliance
May 23, 2025 · Artificial Intelligence

Why High‑Performance Networks Are Critical for Large‑Scale AI Model Training

The whitepaper explains that AI model training and inference rely on massive data computation, with model sizes reaching billions of parameters, demanding low‑latency, high‑bandwidth, stable, scalable, and manageable networks; it compares RDMA‑based InfiniBand and RoCE solutions and offers design recommendations for future AI compute clusters.

AIHigh‑Performance NetworkingInfiniBand
0 likes · 10 min read
Why High‑Performance Networks Are Critical for Large‑Scale AI Model Training
Architects' Tech Alliance
Architects' Tech Alliance
May 15, 2025 · Industry Insights

Why InfiniBand Still Beats Ethernet: Deep Dive into RDMA, Omni‑Path, and Protocol Layers

This article provides a comprehensive technical analysis of InfiniBand architecture, its protocol stack, comparison with Ethernet‑based RDMA solutions like RoCE and iWARP, and an overview of Omni‑Path, highlighting performance advantages, design trade‑offs, and practical limitations.

High‑performance computingInfiniBandOmni‑Path
0 likes · 19 min read
Why InfiniBand Still Beats Ethernet: Deep Dive into RDMA, Omni‑Path, and Protocol Layers
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Apr 30, 2025 · Industry Insights

Network Load Balancing: Emerging Techniques and Innovative Insights

This article surveys current network load‑balancing approaches—including CONGA, Hula, DRILL, Hermes, MP‑RDMA, ConWeave, Proteus, and CAVER—detailing their granularity, information exchange, signaling methods, and the performance gains they achieve in modern data‑center environments.

RDMAdatacenter networkingin-network reordering
0 likes · 13 min read
Network Load Balancing: Emerging Techniques and Innovative Insights
AntData
AntData
Mar 14, 2025 · Fundamentals

Analysis of DeepSeek 3FS Storage Service Architecture and Design

This article provides an in‑depth technical analysis of DeepSeek's open‑source 3FS distributed file system, focusing on the StorageService architecture, space pooling, allocation mechanisms, reference counting, fragmentation handling, and the RDMA‑based read/write data path.

RDMAZero Copyallocation
0 likes · 15 min read
Analysis of DeepSeek 3FS Storage Service Architecture and Design
ByteDance Cloud Native
ByteDance Cloud Native
Mar 13, 2025 · Backend Development

Inside DeepSeek 3FS: Architecture of a High‑Performance Parallel File System

This article dissects DeepSeek's 3FS parallel file system, detailing its four‑component architecture, high‑throughput RDMA networking, metadata handling with FoundationDB, client access methods, chain replication (CRAQ), custom FFRecord format, and recovery mechanisms, offering a deep technical perspective for storage engineers.

Distributed File SystemHigh-performance storageRDMA
0 likes · 22 min read
Inside DeepSeek 3FS: Architecture of a High‑Performance Parallel File System
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 7, 2025 · Operations

Inside 3FS: How DeepSeek’s Parallel File System Powers AI Training

This article dives deep into DeepSeek's 3FS parallel file system, detailing its four-component architecture, RDMA‑based high‑speed networking, client options, metadata and storage services, replication protocols, dynamic stripe sizing, and recovery mechanisms that enable efficient AI model training and inference.

AI trainingDistributed File SystemRDMA
0 likes · 21 min read
Inside 3FS: How DeepSeek’s Parallel File System Powers AI Training
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Mar 3, 2025 · Cloud Computing

How Baidu Cloud Optimizes GPU Servers for AI Workloads

This article explains the design and implementation of GPU cloud servers, covering data processing pipelines, hardware selection, topology, interconnect technologies, virtualization, multi‑GPU communication methods, and Baidu's practical solutions for both virtualized and bare‑metal instances to boost AI inference and training performance.

AIGPUNVLink
0 likes · 29 min read
How Baidu Cloud Optimizes GPU Servers for AI Workloads
AI Cyberspace
AI Cyberspace
Feb 24, 2025 · Cloud Computing

Scaling AI Training: Inside Large-Scale RDMA Networks and Modern Congestion Controls

This article explores the hardware and networking foundations for training massive AI models, detailing the challenges of large‑scale RDMA deployment, the evolution of congestion‑control algorithms like DCQCN, TIMELY, HPCC, and AWS's SRD, and how hardware offload and programmable switches enable scalable, low‑latency AI infrastructure.

AWS SRDDCQCNHPCC
0 likes · 14 min read
Scaling AI Training: Inside Large-Scale RDMA Networks and Modern Congestion Controls
AI Cyberspace
AI Cyberspace
Feb 22, 2025 · Cloud Computing

Why RoCEv2 Needs a Lossless Network and How to Achieve It

RoCE, originally built for InfiniBand, was adapted to Ethernet as RoCEv2, which uses IP/UDP headers to enable L3 routing but is highly sensitive to packet loss, requiring a lossless network and employing technologies such as PFC, ECN, DCQCN, and multi‑path transmission to maintain high RDMA performance.

DCQCNECNPFC
0 likes · 17 min read
Why RoCEv2 Needs a Lossless Network and How to Achieve It
AI Cyberspace
AI Cyberspace
Feb 17, 2025 · Fundamentals

Understanding DMA and RDMA: High‑Performance Direct Memory Access Explained

This article explains the principles of Direct Memory Access (DMA) and Remote Direct Memory Access (RDMA), compares them with traditional TCP I/O, outlines RDMA’s features, protocol standards, communication pathways, queue mechanisms, and provides example code for setting up RDMA connections using RoCEv2.

C programmingDMARDMA
0 likes · 21 min read
Understanding DMA and RDMA: High‑Performance Direct Memory Access Explained
AI Cyberspace
AI Cyberspace
Feb 13, 2025 · Fundamentals

Understanding InfiniBand RDMA: Architecture, Advantages, and NVIDIA Quantum-2

InfiniBand RDMA, designed to network server buses, offers high bandwidth and ultra‑low latency through zero‑copy, kernel‑bypass communication, with a layered architecture (L1‑L5) and hardware components like Quantum‑2 Switch, ConnectX‑7 RNIC, and SHARP acceleration, supported by the Verbs API and OFED stack.

InfiniBandQuantum-2RDMA
0 likes · 25 min read
Understanding InfiniBand RDMA: Architecture, Advantages, and NVIDIA Quantum-2
Deepin Linux
Deepin Linux
Dec 25, 2024 · Fundamentals

An Introduction to RDMA: Principles, Programming, and Applications

This article explains RDMA technology, covering its core principles, programming model with Verbs API, various communication modes, and its impact on data‑center networking, high‑performance computing, and distributed storage, highlighting its low‑latency, zero‑copy advantages over traditional TCP/IP.

Data centerHigh‑performance computingNetwork programming
0 likes · 30 min read
An Introduction to RDMA: Principles, Programming, and Applications
Architects' Tech Alliance
Architects' Tech Alliance
Dec 8, 2024 · Industry Insights

Why InfiniBand Still Beats Ethernet: Deep Dive into RDMA, Omni‑Path, and iWARP

This article provides a comprehensive technical analysis of InfiniBand’s protocol layers, topology, and performance advantages, compares Omni‑Path’s architecture, explains RDMA fundamentals, and details Ethernet‑based RDMA protocols such as RoCE and iWARP, highlighting their trade‑offs and use cases.

High-Performance ComputingInfiniBandOmni‑Path
0 likes · 18 min read
Why InfiniBand Still Beats Ethernet: Deep Dive into RDMA, Omni‑Path, and iWARP
BirdNest Tech Talk
BirdNest Tech Talk
Dec 1, 2024 · Fundamentals

Step-by-Step Guide to RDMA Programming with the ibverbs API

This tutorial walks through the complete RDMA programming workflow using the ibverbs API, covering device initialization, memory registration, completion queue and queue pair creation, state transitions, send/receive operations, completion handling, and resource cleanup with concrete C code examples.

CLow latencyNetwork programming
0 likes · 5 min read
Step-by-Step Guide to RDMA Programming with the ibverbs API
BirdNest Tech Talk
BirdNest Tech Talk
Dec 1, 2024 · Fundamentals

How to Exchange RDMA Connection Parameters: Methods, Pros, and Pitfalls

Establishing an RDMA connection requires exchanging key parameters such as LID, QP number, and memory keys, and this article systematically outlines the essential information, compares six exchange methods—from static configuration to distributed services—and evaluates their advantages, drawbacks, and suitable scenarios.

Distributed SystemsInfiniBandNetworking
0 likes · 7 min read
How to Exchange RDMA Connection Parameters: Methods, Pros, and Pitfalls
BirdNest Tech Talk
BirdNest Tech Talk
Nov 20, 2024 · Backend Development

How to Build High‑Performance RDMA Applications in Go with rsocket

This article explains the fundamentals of RDMA, compares libibverbs and rdma_cm with the user‑space rsocket API, and walks through a complete Go implementation using the smallnest/rsocket library, including both server and client code examples and practical deployment tips.

GoNetwork programmingRDMA
0 likes · 13 min read
How to Build High‑Performance RDMA Applications in Go with rsocket
Architects' Tech Alliance
Architects' Tech Alliance
Nov 7, 2024 · Industry Insights

Why RDMA, InfiniBand, and RoCE Are Redefining High‑Performance Data Center Networks

This article examines the evolution from the OSI and TCP/IP models to RDMA‑based technologies, compares traditional three‑tier and leaf‑spine architectures, analyzes NVIDIA SuperPOD designs, and evaluates Ethernet, InfiniBand, and RoCE switches to guide high‑throughput, low‑latency data‑center networking decisions.

Data Center NetworkingHigh‑performance computingInfiniBand
0 likes · 13 min read
Why RDMA, InfiniBand, and RoCE Are Redefining High‑Performance Data Center Networks
Alibaba Cloud Native
Alibaba Cloud Native
Oct 19, 2024 · Cloud Native

How ApsaraMQ’s Serverless Architecture Powers AI with Event‑Driven Messaging

The talk outlines ApsaraMQ’s journey to a fully serverless, cloud‑native messaging platform, detailing its compute‑storage separation, stateless proxy functions, RDMA‑enhanced performance, elastic scaling mechanisms, and how its event‑driven architecture empowers real‑time AI applications through seamless data vectorization.

AI integrationEvent-drivenMessage Queue
0 likes · 19 min read
How ApsaraMQ’s Serverless Architecture Powers AI with Event‑Driven Messaging
Architects' Tech Alliance
Architects' Tech Alliance
Aug 18, 2024 · Artificial Intelligence

RDMA, InfiniBand, RoCE, and iWARP: High‑Performance Networking for Large‑Scale Generative AI Model Training

The article explains how RDMA technologies—including InfiniBand, RoCE, and iWARP—provide high‑throughput, low‑latency, CPU‑free data transfer for massive generative AI model training, compares their architectures, and discusses modern network designs and load‑balancing strategies to optimize AI‑focused data‑center networks.

AI trainingHigh‑Performance ComputingInfiniBand
0 likes · 11 min read
RDMA, InfiniBand, RoCE, and iWARP: High‑Performance Networking for Large‑Scale Generative AI Model Training
Architects' Tech Alliance
Architects' Tech Alliance
Aug 14, 2024 · Artificial Intelligence

Network Architecture and Performance Requirements for Training Large-Scale Generative AI Models

The article examines the ultra‑large‑scale, high‑bandwidth, low‑latency, and automated network infrastructure needed for training generative AI models, covering custom network designs, congestion control, deterministic RDMA, topology choices such as Fat‑Tree, and emerging deterministic networking technologies.

High BandwidthLow latencyRDMA
0 likes · 8 min read
Network Architecture and Performance Requirements for Training Large-Scale Generative AI Models
Architects' Tech Alliance
Architects' Tech Alliance
Aug 1, 2024 · Industry Insights

Why RDMA and RoCE Are Becoming Critical Enablers for AI/ML Deployments

The article analyzes how the rapid shift of data‑center spending toward AI/ML has accelerated RDMA and RoCE adoption, outlines market forecasts through 2028, explains the technical advantages of direct memory access, and examines the evolving server, NIC, and backend‑network landscapes that will shape future AI workloads.

AI/MLData centerRDMA
0 likes · 12 min read
Why RDMA and RoCE Are Becoming Critical Enablers for AI/ML Deployments
Open Source Linux
Open Source Linux
Jul 24, 2024 · Artificial Intelligence

Why RDMA Is the Secret Engine Powering AI/ML Data Center Growth

The article explains how RDMA and RoCE technologies, originally built for high‑performance computing, are rapidly expanding in AI/ML data centers, driving massive market growth, faster GPU communication, and lower job completion times as server designs evolve toward higher GPU counts and faster NICs.

AI/MLMarket TrendsRDMA
0 likes · 10 min read
Why RDMA Is the Secret Engine Powering AI/ML Data Center Growth
Baidu Geek Talk
Baidu Geek Talk
Jul 10, 2024 · Artificial Intelligence

Baidu HPN Network: Solving Hash Collision for 95% Physical Network Bandwidth Efficiency in Large Model Training

Baidu's HPN network solves hash‑collision bottlenecks in large‑model training by combining TOR‑affinity scheduling with Dynamic Load Balancing on self‑developed switches, boosting physical network bandwidth efficiency to about 95%, improving throughput by roughly 10% and adding a further 1.5% training‑speed gain via the BCCL library.

Baidu CloudDLB Dynamic Load BalancingHPN Network
0 likes · 12 min read
Baidu HPN Network: Solving Hash Collision for 95% Physical Network Bandwidth Efficiency in Large Model Training
Architects' Tech Alliance
Architects' Tech Alliance
Jul 6, 2024 · Industry Insights

Why Ethernet Struggles with AI Workloads and How Adaptive Routing Solves It

The article analyzes how AI‑driven elephant flows overload traditional Ethernet networks, causing long‑tail latency and victim‑flow congestion, and explains how adaptive routing, RDMA/ RoCE features, advanced congestion‑control algorithms, and high‑capacity switch chips can mitigate these challenges.

AI computingAdaptive routingElephant flow
0 likes · 7 min read
Why Ethernet Struggles with AI Workloads and How Adaptive Routing Solves It
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jul 3, 2024 · Operations

How to Eliminate Network Hash Collisions in Large‑Model Training

This article examines the impact of GPU communication bottlenecks on large‑model training, analyzes hash‑collision issues in high‑performance networks, and presents three practical solutions—including increasing RDMA streams, affinity‑aware scheduling, and dynamic load balancing—to boost effective network bandwidth up to 95%.

Hash CollisionRDMAdynamic load balancing
0 likes · 11 min read
How to Eliminate Network Hash Collisions in Large‑Model Training
Architects' Tech Alliance
Architects' Tech Alliance
May 19, 2024 · Industry Insights

InfiniBand vs RoCEv2: Which High‑Performance Network Wins AI Compute?

With AI models growing to billions of parameters, the choice of high‑performance interconnect—InfiniBand or RoCEv2—directly impacts training speed, scalability, latency, and operational complexity, and this article analyzes their architectures, performance metrics, vendor ecosystems, and suitability for large‑scale AI clusters.

AIDistributed TrainingHigh‑performance computing
0 likes · 13 min read
InfiniBand vs RoCEv2: Which High‑Performance Network Wins AI Compute?
Architects' Tech Alliance
Architects' Tech Alliance
May 9, 2024 · Industry Insights

Why RoCE Is Reshaping High‑Performance Computing Networks

The article provides a detailed technical analysis of RoCE (RDMA over Converged Ethernet), its two protocol versions, packet overhead, congestion‑control mechanisms, Soft‑RoCE implementation, and the challenges and performance implications of deploying RoCE in modern HPC environments compared to InfiniBand and traditional Ethernet solutions.

HPCInfiniBandRDMA
0 likes · 17 min read
Why RoCE Is Reshaping High‑Performance Computing Networks
Architects' Tech Alliance
Architects' Tech Alliance
May 3, 2024 · Fundamentals

From OSI Model to RDMA: High‑Performance Networking, Leaf‑Spine Architecture, and Switch Selection

This article examines the evolution of network protocols from the OSI seven‑layer model and TCP/IP to RDMA technologies such as InfiniBand and RoCE, compares traditional three‑tier and leaf‑spine data‑center designs, and evaluates Ethernet, InfiniBand, and RoCE switches for high‑throughput, low‑latency HPC environments.

Data center architectureInfiniBandLeaf-Spine
0 likes · 13 min read
From OSI Model to RDMA: High‑Performance Networking, Leaf‑Spine Architecture, and Switch Selection
Architects' Tech Alliance
Architects' Tech Alliance
Apr 28, 2024 · Industry Insights

Why RoCE v2 Is Outpacing InfiniBand for Modern Data Centers

This article provides an in‑depth technical analysis of RoCE v2, covering its architecture, NIC requirements, and detailed comparisons with InfiniBand across physical layers, protocol stacks, switching, congestion handling, routing, and topology, while also highlighting the UEC alliance’s new transport protocol initiative.

High‑performance computingInfiniBandRDMA
0 likes · 12 min read
Why RoCE v2 Is Outpacing InfiniBand for Modern Data Centers
360 Smart Cloud
360 Smart Cloud
Apr 25, 2024 · Cloud Native

Building High‑Performance RoCE v2 and InfiniBand Networks in a Cloud‑Native Environment for Large‑Model Training

This article explains how to construct high‑performance RoCE v2 and InfiniBand networks within a cloud‑native Kubernetes environment, detailing the underlying technologies, required components, configuration steps, and performance test results that demonstrate significant communication speed improvements for large‑scale AI model training.

AI trainingCloud NativeHigh‑Performance Networking
0 likes · 12 min read
Building High‑Performance RoCE v2 and InfiniBand Networks in a Cloud‑Native Environment for Large‑Model Training
Architects' Tech Alliance
Architects' Tech Alliance
Apr 21, 2024 · Fundamentals

Understanding RDMA: InfiniBand, RoCE, and Their Role in High‑Performance AI Model Training

This article explains how Remote Direct Memory Access (RDMA) technologies such as InfiniBand and RoCE bypass OS kernels to achieve ultra‑low latency and high bandwidth, discusses their hardware implementations, cost considerations, and their critical impact on large‑scale AI model training and HPC network design.

AIGPUHigh‑Performance Computing
0 likes · 11 min read
Understanding RDMA: InfiniBand, RoCE, and Their Role in High‑Performance AI Model Training
Linux Code Review Hub
Linux Code Review Hub
Apr 7, 2024 · Industry Insights

A Decade of RDMA: Lessons Learned from Protocol Evolution

The article reviews ten years of RDMA development, tracing its origins, the rise and pitfalls of RoCEv1/v2, alternative approaches like iWARP and Cisco usNIC, and recent modernizations such as AWS SRD, Google Falcon and UltraEthernet, highlighting why protocol design choices have repeatedly stalled industry progress.

AI AcceleratorsData Center NetworkingRDMA
0 likes · 27 min read
A Decade of RDMA: Lessons Learned from Protocol Evolution
Linux Code Review Hub
Linux Code Review Hub
Feb 20, 2024 · Fundamentals

Why TCP Needs a Rethink: RDMA Insights and 800 Gbps Experiments

The talk examines the challenges of using standard Linux TCP for high‑performance data‑center workloads, explores how RDMA can provide zero‑copy and asynchronous kernel bypass, and presents experimental results from an FPGA‑based prototype that approaches 800 Gbps packet rates while highlighting congestion‑control and CPU‑utilization trade‑offs.

FPGAHigh‑Performance NetworkingKernel Bypass
0 likes · 23 min read
Why TCP Needs a Rethink: RDMA Insights and 800 Gbps Experiments
Architects' Tech Alliance
Architects' Tech Alliance
Feb 14, 2024 · Industry Insights

Why InfiniBand Is Outpacing Ethernet in High‑Performance Computing

This article provides a comprehensive overview of InfiniBand technology, covering its history, architecture, packet format, layer functions, switching mechanisms, and performance advantages over Ethernet, while highlighting its rapid growth and future prospects in HPC environments.

ComparisonHigh‑performance computingInfiniBand
0 likes · 15 min read
Why InfiniBand Is Outpacing Ethernet in High‑Performance Computing
Architects' Tech Alliance
Architects' Tech Alliance
Feb 9, 2024 · Industry Insights

Why NVMe‑oF Is Redefining High‑Performance Storage Networks

This article explains how the shift from HDD to ultra‑fast SSDs and NVMe changes storage networking, compares NVMe with legacy SCSI, details NVMe‑oF transport options (FC, TCP, RDMA), examines RDMA variants, and outlines the network requirements and trade‑offs for deploying NVMe‑oF in modern data centers.

Data centerNVMeNVMe-oF
0 likes · 17 min read
Why NVMe‑oF Is Redefining High‑Performance Storage Networks
vivo Internet Technology
vivo Internet Technology
Dec 13, 2023 · Artificial Intelligence

Practice of Multi-NIC Container Network Acceleration for Offline Training

The talk explains how Vivo leverages a Kubernetes‑based solution that combines Calico and RoCEv2 to migrate offline training workloads from single‑NIC to multi‑NIC, integrating loss‑less RDMA, planning topology and IP allocation, and employing Volcano, SpiderPool, Macvlan, and Multus CNI for efficient container networking.

Cloud NativeKubernetesMulti-NIC
0 likes · 4 min read
Practice of Multi-NIC Container Network Acceleration for Offline Training
Architects' Tech Alliance
Architects' Tech Alliance
Dec 6, 2023 · Artificial Intelligence

The Relationship Between Switches, Network Protocols, and AI in Modern Data Centers

This article explains how network protocols and switch architectures—including OSI layers, TCP/IP, RDMA, InfiniBand, RoCE, and leaf‑spine designs—support high‑throughput, low‑latency AI and HPC workloads, compares Ethernet and InfiniBand markets, and examines NVIDIA’s Spectrum/X and SuperPOD solutions.

AIData Center NetworkingInfiniBand
0 likes · 11 min read
The Relationship Between Switches, Network Protocols, and AI in Modern Data Centers
Architects' Tech Alliance
Architects' Tech Alliance
Nov 8, 2023 · Product Management

Global Switch Market Size, Competitive Landscape, and Technology Trends (2023)

The article analyzes the 2022‑2027 global and Chinese network switch market sizes, outlines the competitive landscape dominated by Cisco and domestic players, and examines emerging technology trends such as lossless data‑center solutions, white‑box switches, silicon‑photonic optics, liquid‑cooling, TSN, and the coexistence of Ethernet and InfiniBand switches.

RDMAWhite-boxliquid cooling
0 likes · 10 min read
Global Switch Market Size, Competitive Landscape, and Technology Trends (2023)
Architects' Tech Alliance
Architects' Tech Alliance
Oct 21, 2023 · Operations

Understanding NVMe, NVMe‑oF, and RDMA for High‑Performance Storage

This article explains how the emergence of ultra‑fast SSDs and NVMe reshapes storage architecture, details the NVMe protocol and its extensions over fabrics, compares RDMA‑based transport options such as FC, TCP, and RoCE, and discusses network requirements and performance trade‑offs for modern data‑center deployments.

Data centerNVMeRDMA
0 likes · 17 min read
Understanding NVMe, NVMe‑oF, and RDMA for High‑Performance Storage
Architects' Tech Alliance
Architects' Tech Alliance
Aug 10, 2023 · Industry Insights

InfiniBand vs RoCEv2: Which Network Powers AI Model Training?

This article examines the architecture of AI compute clusters, explaining offline training and inference pipelines, the role of RDMA, and the technical differences between InfiniBand and RoCEv2—including latency, bandwidth, scalability, cost, and vendor considerations—to help engineers choose the optimal high‑performance network for large‑model training.

AI computeDistributed TrainingHigh‑Performance Networking
0 likes · 13 min read
InfiniBand vs RoCEv2: Which Network Powers AI Model Training?
ByteDance SYS Tech
ByteDance SYS Tech
Aug 1, 2023 · Cloud Native

How ByteFUSE Revolutionizes High‑Performance Cloud‑Native Storage with FUSE and RDMA

ByteFUSE, a user‑space FUSE‑based solution for ByteNAS, delivers low‑latency, high‑throughput, POSIX‑compatible storage across AI training, database backup, and search services by replacing NFS with a cloud‑native architecture that leverages CSI, RDMA, and kernel‑module hot‑upgrade techniques.

Distributed File SystemFUSEKubernetes
0 likes · 19 min read
How ByteFUSE Revolutionizes High‑Performance Cloud‑Native Storage with FUSE and RDMA
Architects' Tech Alliance
Architects' Tech Alliance
Jul 30, 2023 · Fundamentals

Understanding Network Protocols, Switches, and RDMA in AI‑Driven Data Centers

This article explains the fundamentals of network protocols and the OSI model, describes how high‑performance computing and AI workloads drive the transition from TCP/IP to RDMA technologies such as InfiniBand, RoCE and iWARP, and examines modern data‑center switch architectures, market trends, and NVIDIA’s AI‑focused networking solutions.

AINetwork ProtocolsRDMA
0 likes · 12 min read
Understanding Network Protocols, Switches, and RDMA in AI‑Driven Data Centers
Architects' Tech Alliance
Architects' Tech Alliance
Jul 24, 2023 · Operations

NVIDIA Quantum‑2 InfiniBand Platform Overview and Technical Q&A

This article introduces NVIDIA's Quantum‑2 InfiniBand solution for high‑performance computing, explains its HDR 200 Gb/s architecture, and provides a comprehensive Q&A covering cable compatibility, SuperPod networking, UFM management, PCIe bandwidth, and RDMA support for both IB and Ethernet environments.

InfiniBandPCIeRDMA
0 likes · 9 min read
NVIDIA Quantum‑2 InfiniBand Platform Overview and Technical Q&A
Open Source Linux
Open Source Linux
Jun 13, 2023 · Fundamentals

Why RDMA Outperforms Traditional Networking: A Deep Dive into DMA

This article explains the fundamentals of Direct Memory Access (DMA) and Remote Direct Memory Access (RDMA), compares their data transfer mechanisms with traditional networking, and outlines RDMA's advantages, protocols, ecosystem, and real‑world adoption in high‑performance computing and data centers.

DMAHardwareHigh‑performance computing
0 likes · 13 min read
Why RDMA Outperforms Traditional Networking: A Deep Dive into DMA
Open Source Linux
Open Source Linux
Apr 14, 2023 · Fundamentals

Why InfiniBand Is the Fastest Growing High‑Speed Interconnect for HPC

This article provides a comprehensive overview of InfiniBand technology, covering its history, architecture, packet structure, layer hierarchy, switching mechanisms, and performance advantages over Ethernet, highlighting its role as a low‑latency, high‑bandwidth solution for high‑performance computing.

High‑performance computingInfiniBandRDMA
0 likes · 14 min read
Why InfiniBand Is the Fastest Growing High‑Speed Interconnect for HPC
Architects' Tech Alliance
Architects' Tech Alliance
Apr 12, 2023 · Fundamentals

Applying RoCE (RDMA over Converged Ethernet) to High‑Performance Computing: Benefits, Challenges, and Case Studies

This article examines the RoCE protocol—an RDMA‑enabled Ethernet technology—its evolution, technical details, congestion‑control mechanisms, performance comparisons with InfiniBand, practical deployment issues in HPC clusters, and real‑world case studies such as Slingshot and application benchmarks.

HPCRDMARoCE
0 likes · 19 min read
Applying RoCE (RDMA over Converged Ethernet) to High‑Performance Computing: Benefits, Challenges, and Case Studies
Architects' Tech Alliance
Architects' Tech Alliance
Mar 26, 2023 · Fundamentals

Comprehensive Overview of InfiniBand Technology and Architecture

This article provides an in‑depth examination of InfiniBand, covering its rapid development as a high‑bandwidth, low‑latency interconnect technology, the InfiniBand Trade Association, detailed packet structures, layered architecture, switching mechanisms, and a comparative analysis with Ethernet, highlighting its advantages for high‑performance computing.

Data TransferHPCHigh‑performance computing
0 likes · 14 min read
Comprehensive Overview of InfiniBand Technology and Architecture
Tencent Cloud Developer
Tencent Cloud Developer
Mar 22, 2023 · Artificial Intelligence

Tencent Star Network: High‑Performance GPU Cluster Architecture for Large‑Scale AI Model Training

Tencent’s Star Network delivers a 1.6 Tbps Ethernet‑RDMA fabric, fat‑tree topology supporting up to 4 K GPUs, multi‑track traffic aggregation and adaptive heterogeneous links plus a custom TCCL library, cutting AllReduce overhead from 35 % to 3.7 %, speeding AI training iterations by 32 % while automating deployment and providing sub‑second self‑healing.

AI trainingGPU clustersRDMA
0 likes · 19 min read
Tencent Star Network: High‑Performance GPU Cluster Architecture for Large‑Scale AI Model Training
Architects' Tech Alliance
Architects' Tech Alliance
Jan 13, 2023 · Fundamentals

2022 DPU Development Analysis Report and Related Network Technologies

The 2022 DPU Development Analysis Report outlines the evolution of Data Processing Units from CPU/NP and FPGA‑CPU architectures to ASIC‑CPU designs, discusses RDMA high‑speed networking, data‑plane forwarding techniques, network programmability, and the emerging open DPU software ecosystem, highlighting their performance, power, and cost implications for modern data centers.

ASICDPUData Plane
0 likes · 14 min read
2022 DPU Development Analysis Report and Related Network Technologies
Top Architect
Top Architect
Jan 4, 2023 · Cloud Computing

Diskless Architecture for Modern Data Centers: Challenges, Technologies, and Industry Practices

The article outlines the evolution of data‑center architectures, identifies capacity, efficiency, and performance challenges of traditional storage‑compute models, and presents the emerging Diskless architecture—leveraging DPU, CXL, RDMA, and high‑throughput networking—to achieve decoupled, pool‑based resources and improve overall data‑center utilization.

CXLComputeDPU
0 likes · 12 min read
Diskless Architecture for Modern Data Centers: Challenges, Technologies, and Industry Practices
Refining Core Development Skills
Refining Core Development Skills
Oct 24, 2022 · Fundamentals

Low‑Latency Network Architecture for High‑Frequency Trading

This article explains how high‑frequency trading firms achieve ultra‑low network latency by combining proximity deployment, dedicated links, microwave transmission, InfiniBand, low‑latency switches, kernel bypass, RDMA, TCP offload engines and FPGA acceleration, and summarizes the impact of each technique on overall request latency.

FPGAInfiniBandKernel Bypass
0 likes · 16 min read
Low‑Latency Network Architecture for High‑Frequency Trading
Architects' Tech Alliance
Architects' Tech Alliance
Oct 10, 2022 · Fundamentals

All‑Flash Storage System Architecture and Key Functions (Dorado Flash Product Example)

The article explains the fully interconnected architecture of an all‑flash storage system, covering redundant FRU modules, RDMA‑based high‑speed networking, intelligent disk enclosures, SSD structure, wear‑leveling, bad‑block management, data redundancy, and the differences between SAS and NVMe protocols.

All-Flash StorageNVMeRDMA
0 likes · 12 min read
All‑Flash Storage System Architecture and Key Functions (Dorado Flash Product Example)
Architects' Tech Alliance
Architects' Tech Alliance
Oct 6, 2022 · Cloud Computing

Overview of NVIDIA BlueField DPU Features and Architecture

This article provides a comprehensive overview of NVIDIA BlueField DPU technology, detailing its interface and driver architecture, work modes, kernel representation model, multi‑host capabilities, virtual switch implementations, scalable functions, RDMA support, security accelerators, and related performance‑enhancing features for modern cloud and data‑center environments.

BlueFieldDPDKDPU
0 likes · 12 min read
Overview of NVIDIA BlueField DPU Features and Architecture
Architects' Tech Alliance
Architects' Tech Alliance
Sep 10, 2022 · Fundamentals

Overview of NVIDIA DOCA and SmartNIC/DPU Technologies

This article provides a comprehensive overview of NVIDIA's DOCA framework, BlueField DPU architecture, SDK components, programming models, and related technologies such as RDMA, RoCE, and GPUDirect RDMA, highlighting their roles in modern data‑center acceleration and security.

DOCADPUGPU
0 likes · 8 min read
Overview of NVIDIA DOCA and SmartNIC/DPU Technologies
Architects' Tech Alliance
Architects' Tech Alliance
Sep 4, 2022 · Fundamentals

Applying RoCE (RDMA over Converged Ethernet) to High‑Performance Computing: Benefits, Challenges, and Case Studies

This article examines the RoCE protocol and its use in high‑performance computing, describing its low‑latency advantages, congestion‑control mechanisms, performance comparisons with InfiniBand, practical deployment issues, and real‑world case studies such as Slingshot and CESM/GROMACS benchmarks.

HPCRDMARoCE
0 likes · 18 min read
Applying RoCE (RDMA over Converged Ethernet) to High‑Performance Computing: Benefits, Challenges, and Case Studies