Tagged articles
5 articles
Page 1 of 1
Architects' Tech Alliance
Architects' Tech Alliance
May 19, 2024 · Industry Insights

How to Build a 10,000‑GPU Supercluster: Core Design Principles and Architecture

This article analyzes the challenges and solutions for constructing a super‑large GPU training cluster, outlining five fundamental design principles, a four‑layer plus one‑domain architecture, and practical considerations for hardware, networking, and operational reliability in AI workloads.

AI trainingGPU clusterHigh‑performance computing
0 likes · 8 min read
How to Build a 10,000‑GPU Supercluster: Core Design Principles and Architecture
Architects' Tech Alliance
Architects' Tech Alliance
May 11, 2024 · Industry Insights

Why Network Interconnects Are the New Bottleneck for Large‑Model AI Training

The rapid growth of AI large‑model training and inference is driving unprecedented demand for compute and high‑speed networking, prompting a shift from traditional GPU clusters to super‑pooled intelligent computing centers that must balance multiple intra‑ and inter‑node interconnect solutions such as NVLink, OAM/UBB, InfiniBand and RoCEv2.

AIData centerInfiniBand
0 likes · 6 min read
Why Network Interconnects Are the New Bottleneck for Large‑Model AI Training
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 16, 2023 · Industry Insights

Why Alibaba’s DAC Strategy Revolutionized Data Center Networking

This article analyzes how Alibaba’s large‑scale deployment of Direct Attach Cables (DAC) transformed data‑center physical networking by cutting costs, reducing power consumption, improving reliability and latency, and driving architectural innovations that address past adoption barriers and future challenges.

AOCAlibabaDAC
0 likes · 19 min read
Why Alibaba’s DAC Strategy Revolutionized Data Center Networking
Architects' Tech Alliance
Architects' Tech Alliance
Nov 7, 2018 · Fundamentals

Survey of Network Types and Vendors in High‑Performance Computing (HPC) Environments

The Intersect360 2016 survey of 474 HPC sites covering 723 compute systems, 633 storage systems and 638 LANs reveals that Ethernet and InfiniBand dominate system interconnect, storage and LAN networks, with Mellanox and Cisco accounting for over half of installations, while newer technologies such as 10 GE, 40 G, 56 G InfiniBand and Omni‑Path show evolving market shares driven by bandwidth and latency demands.

CiscoHPCInfiniBand
0 likes · 10 min read
Survey of Network Types and Vendors in High‑Performance Computing (HPC) Environments
Architects' Tech Alliance
Architects' Tech Alliance
Apr 24, 2016 · Operations

Comprehensive Overview of Data Center Active‑Active (Dual‑Active) Solutions

This article provides an in‑depth technical overview of data‑center active‑active architectures, covering network interconnects, storage SAN/Fibre Channel links, application clustering, arbitration mechanisms, gateway‑based designs, technical requirements, and practical limitations for achieving end‑to‑end high availability.

Active-ActiveData centerNetwork Interconnect
0 likes · 14 min read
Comprehensive Overview of Data Center Active‑Active (Dual‑Active) Solutions