Tag

high-performance networking

0 views collected around this technical thread.

Architects' Tech Alliance
Architects' Tech Alliance
Sep 8, 2024 · Artificial Intelligence

Design and Architecture of Multi‑Million GPU Clusters for Large‑Scale AI Model Training

The article surveys the network architectures and congestion‑control techniques used in massive GPU clusters—such as Byte’s megascale, Baidu HPN, Alibaba HPN7, and Tencent Xingmai 2.0—highlighting how high‑bandwidth, low‑latency designs and advanced RDMA technologies enable training of trillion‑parameter multimodal AI models.

AI infrastructureGPU clustersHPN
0 likes · 11 min read
Design and Architecture of Multi‑Million GPU Clusters for Large‑Scale AI Model Training
Architects' Tech Alliance
Architects' Tech Alliance
May 23, 2024 · Cloud Computing

Design and Comparison of High‑Performance Cloud Data Center Networks for AI Computing

This article analyzes traditional cloud data center network limitations for AI workloads and compares various high‑bandwidth, low‑latency architectures—including two‑layer and three‑layer fat‑tree designs, InfiniBand, and RoCE—providing best‑practice recommendations for building scalable, non‑blocking AI‑Pool networks.

AI computingFat TreeGPU clusters
0 likes · 12 min read
Design and Comparison of High‑Performance Cloud Data Center Networks for AI Computing
ByteDance SYS Tech
ByteDance SYS Tech
Apr 26, 2024 · Backend Development

How io_uring Integration Boosts Netpoll Throughput and Slashes Latency

This article examines the integration of Linux io_uring into ByteDance's high‑performance Netpoll NIO library, detailing architectural changes, receive/send workflows, benchmarking methodology, and results that show over 10% higher throughput and 20‑40% lower latency while eliminating system calls.

GoNetpollbenchmark
0 likes · 18 min read
How io_uring Integration Boosts Netpoll Throughput and Slashes Latency
360 Smart Cloud
360 Smart Cloud
Apr 25, 2024 · Cloud Native

Building High‑Performance RoCE v2 and InfiniBand Networks in a Cloud‑Native Environment for Large‑Model Training

This article explains how to construct high‑performance RoCE v2 and InfiniBand networks within a cloud‑native Kubernetes environment, detailing the underlying technologies, required components, configuration steps, and performance test results that demonstrate significant communication speed improvements for large‑scale AI model training.

AI trainingInfiniBandKubernetes
0 likes · 12 min read
Building High‑Performance RoCE v2 and InfiniBand Networks in a Cloud‑Native Environment for Large‑Model Training
NetEase LeiHuo UX Big Data Technology
NetEase LeiHuo UX Big Data Technology
Jan 17, 2024 · Backend Development

Understanding DPDK: Background, Architecture, High‑Performance Techniques, and Real‑World Applications

This article explains the origins of DPDK, describes its modular architecture and performance‑enhancing mechanisms such as UIO, hugepages, and CPU affinity, and reviews popular user‑space networking frameworks like F‑Stack and Seastar that leverage DPDK for high‑throughput cloud services.

DPDKF-StackSeastar
0 likes · 9 min read
Understanding DPDK: Background, Architecture, High‑Performance Techniques, and Real‑World Applications
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 11, 2023 · Cloud Computing

Alibaba Cloud Executive Discusses IPv6 Deployment, Global Collaboration, and AI‑Driven Network Evolution at the 2023 Wuzhen Internet Forum

In a detailed interview at the 2023 Wuzhen Internet Forum, Alibaba Cloud’s infrastructure lead Cai Dezhong outlines the three‑phase IPv6 rollout, highlights organizational and technical innovations, stresses the need for global cooperation, and explains how IPv6 underpins the next generation AI infrastructure and predictable high‑performance networking.

AI infrastructureCloud ComputingGlobal Collaboration
0 likes · 9 min read
Alibaba Cloud Executive Discusses IPv6 Deployment, Global Collaboration, and AI‑Driven Network Evolution at the 2023 Wuzhen Internet Forum
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Sep 19, 2023 · Cloud Computing

AI‑Era Cloud Infrastructure: High Compute Density, Linear Scalability and Intelligent Operations – Highlights from the 2023 Open Data Center Conference

The 2023 Open Data Center Conference in Beijing showcased Alibaba Cloud's AI‑era infrastructure innovations—including high‑density compute clusters, predictable high‑performance networking, intelligent power‑simulation systems, battery diagnostics, liquid‑cooling solutions, and modular server standards—demonstrating how cloud platforms are being rebuilt to meet the demands of large AI models and sustainable operation.

AIIntelligent OperationsModular Servers
0 likes · 10 min read
AI‑Era Cloud Infrastructure: High Compute Density, Linear Scalability and Intelligent Operations – Highlights from the 2023 Open Data Center Conference
Tencent Cloud Developer
Tencent Cloud Developer
Mar 22, 2023 · Artificial Intelligence

Tencent Star Network: High‑Performance GPU Cluster Architecture for Large‑Scale AI Model Training

Tencent’s Star Network delivers a 1.6 Tbps Ethernet‑RDMA fabric, fat‑tree topology supporting up to 4 K GPUs, multi‑track traffic aggregation and adaptive heterogeneous links plus a custom TCCL library, cutting AllReduce overhead from 35 % to 3.7 %, speeding AI training iterations by 32 % while automating deployment and providing sub‑second self‑healing.

AI trainingGPU clustersRDMA
0 likes · 19 min read
Tencent Star Network: High‑Performance GPU Cluster Architecture for Large‑Scale AI Model Training
Tencent Cloud Developer
Tencent Cloud Developer
Dec 20, 2022 · Cloud Computing

HARP – Tencent Cloud's High‑Performance, Highly Available Network Transmission Protocol

HARP is Tencent Cloud's high-performance, highly available network transmission protocol that quickly reroutes around switch failures within 100 µs, offering zero packet loss, low latency, high bandwidth, scalable connections, and custom congestion control for storage, HPC, AI, and big data workloads.

Cloud ComputingHARPcongestion control
0 likes · 15 min read
HARP – Tencent Cloud's High‑Performance, Highly Available Network Transmission Protocol
Tencent Cloud Developer
Tencent Cloud Developer
Jun 6, 2022 · Cloud Computing

High‑Performance Network Solutions: RDMA, RoCE, iWARP and io_uring – Principles, Implementation and Benchmark Analysis

The article reviews high‑performance networking options—RDMA (including RoCE v2 and iWARP) and Linux’s io_uring—explaining their principles, hardware requirements, and benchmark results, and concludes that while RDMA delivers ultra‑low latency for specialized workloads, io_uring offers modest network benefits, leaving TCP as the default for most services.

Linux kernelRDMAbenchmark
0 likes · 10 min read
High‑Performance Network Solutions: RDMA, RoCE, iWARP and io_uring – Principles, Implementation and Benchmark Analysis
Architects' Tech Alliance
Architects' Tech Alliance
May 19, 2022 · Fundamentals

An Introduction to RDMA: Concepts, Advantages, Protocols, and Programming Basics

This article explains the fundamentals of Remote Direct Memory Access (RDMA), comparing it with traditional networking, outlining its core advantages, suitable use cases, the three main RDMA protocols (Infiniband, RoCE, iWARP), deployment requirements, communication flow, and essential programming concepts.

InfiniBandRDMARoCE
0 likes · 9 min read
An Introduction to RDMA: Concepts, Advantages, Protocols, and Programming Basics
Architects' Tech Alliance
Architects' Tech Alliance
Mar 7, 2021 · Fundamentals

Understanding RDMA: InfiniBand, iWARP, and RoCE Technologies and Their Differences

This article explains Remote Direct Memory Access (RDMA), its origins in InfiniBand, the Ethernet‑based variants iWARP and RoCE (including RoCEv1 and RoCEv2), compares their architectures, performance characteristics, and deployment requirements for high‑performance computing and data‑center networks.

InfiniBandRDMARoCE
0 likes · 11 min read
Understanding RDMA: InfiniBand, iWARP, and RoCE Technologies and Their Differences
Architects' Tech Alliance
Architects' Tech Alliance
Nov 11, 2020 · Fundamentals

Understanding DPDK Memory Management: Large Pages, NUMA, DMA, and IOMMU

This article explains the core principles of DPDK memory management, covering standard huge pages, NUMA node binding, direct memory access, IOMMU and IOVA addressing, custom allocators, and memory pools, and how these mechanisms together enable high‑performance packet processing on Linux systems.

DMADPDKIOMMU
0 likes · 14 min read
Understanding DPDK Memory Management: Large Pages, NUMA, DMA, and IOMMU
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 9, 2019 · Cloud Computing

The Next Decade of Cloud Networking: Highlights from Alibaba Cloud Network Forum at Yunqi Conference 2019

The 2019 Yunqi Conference Cloud Network Forum gathered over two hundred network enthusiasts to review a decade of Alibaba data‑center networking evolution, explore emerging technologies such as AI, big data, and programmable chips, and outline the next ten years of high‑performance, data‑centric cloud networking.

Artificial IntelligenceBig DataCloud Networking
0 likes · 9 min read
The Next Decade of Cloud Networking: Highlights from Alibaba Cloud Network Forum at Yunqi Conference 2019
Architects' Tech Alliance
Architects' Tech Alliance
Jul 5, 2019 · Backend Development

A Comprehensive Overview of DPDK and SPDK Technologies

This article provides an in‑depth technical overview of DPDK and SPDK, covering their background, the evolution of network I/O, Linux bottlenecks, user‑space I/O via UIO, poll‑mode drivers, performance‑optimizing techniques such as huge pages, SIMD, cache management, and the surrounding ecosystem and adoption.

DPDKSPDKUser-space I/O
0 likes · 15 min read
A Comprehensive Overview of DPDK and SPDK Technologies
Architects' Tech Alliance
Architects' Tech Alliance
Apr 8, 2019 · Fundamentals

Understanding RDMA: Principles, Advantages, and Implementation Details

This article explains how RDMA (Remote Direct Memory Access) technology, originating from InfiniBand and extended to Ethernet (RoCE) and TCP/IP (iWARP), provides ultra‑low latency, high throughput, and minimal CPU usage for high‑performance computing and big‑data applications by bypassing traditional OS and protocol stack processing.

InfiniBandRDMARoCE
0 likes · 8 min read
Understanding RDMA: Principles, Advantages, and Implementation Details
Architects' Tech Alliance
Architects' Tech Alliance
Feb 14, 2019 · Fundamentals

Understanding RDMA (Remote Direct Memory Access): Background, Related Work, and Technical Details

This article provides a comprehensive overview of Remote Direct Memory Access (RDMA), covering its background, limitations of traditional TCP/IP, related technologies such as TOE, U-Net, VIA, and detailed explanations of RDMA concepts, hardware implementations, verbs, and communication workflows.

RDMARemote Direct Memory Accesshigh-performance networking
0 likes · 17 min read
Understanding RDMA (Remote Direct Memory Access): Background, Related Work, and Technical Details
Architects' Tech Alliance
Architects' Tech Alliance
Dec 4, 2018 · Fundamentals

Understanding RDMA High‑Performance Networks: Principles, Benefits, and Applications in Machine Learning

The article explains the background, architecture, and performance advantages of RDMA high‑performance networking, compares it with traditional TCP/IP, describes its deployment at Baidu for machine‑learning workloads, and outlines future use cases such as storage acceleration, GPU communication, and core services.

InfiniBandRDMARoCE
0 likes · 12 min read
Understanding RDMA High‑Performance Networks: Principles, Benefits, and Applications in Machine Learning