Industry Insights 15 min read

AI Compute Infrastructure: Trends, Scaling Laws, and the Rise of Massive Clusters

The article analyzes the development of AI compute infrastructure, detailing the three‑level architecture from chip to cluster, the scaling law linking model parameters to compute demand, the rapid growth of massive “ten‑thousand‑card” clusters worldwide, and the emerging demand for inference workloads driving new deployment and scheduling strategies.

Architects' Tech Alliance

Nov 10, 2024

AI Compute Infrastructure: Trends, Scaling Laws, and the Rise of Massive Clusters

AI Compute Infrastructure Overview

Artificial intelligence compute (AI compute) infrastructure, also called "intelligent computing power," supports accelerated training and inference of AI models. Its deployment spans three layers: chip‑level (GPU, NPU, FPGA, ASIC), single‑server node level (heterogeneous "CPU+XPU" servers), and multi‑server cluster level (large‑scale clusters providing parallel computing).

Chip‑Level Landscape

Domestic companies such as Huawei, TianShu, HaiGuang, and Cambricon are actively developing AI accelerators. Huawei’s Ascend series offers a full‑stack, high‑performance solution that underpins intelligent computing.

Single‑Server Node Level

The dominant paradigm is heterogeneous computing with "CPU+XPU". Traditional server vendors like Inspur and Dell combine Intel CPUs with NVIDIA GPUs, while Huawei and partners launch AI servers based on domestically developed AI chips.

Multi‑Server Cluster Level

Large‑scale models drive the need for high‑performance, lossless networks and storage to support both node‑level and cluster‑level compute breakthroughs. Coordinated development of compute, network, and storage is essential for AI infrastructure.

Scaling Laws and the Rise of Massive "Ten‑Thousand‑Card" Clusters

According to the scaling‑law paper "Scaling Laws for Neural Language Models" (OpenAI), training compute demand is proportional to six times the product of model parameters and training dataset size: Training Compute = 6 × Parameters × Dataset Size . As models grow from hundreds of billions to trillions of parameters, the demand for compute escalates dramatically.

To meet this demand, "ten‑thousand‑card" (万卡) clusters—systems comprising ten thousand or more AI accelerators (GPU, NPU, TPU, etc.)—are becoming the standard for large‑model training. The United States leads this effort with clusters such as Google’s A3 VM (26,000 NVIDIA H100 GPUs) and Meta’s AI Research Super Cluster (up to 24,576 NVIDIA H100 GPUs).

Chinese tech giants and telecom operators are also building massive clusters. ByteDance operates a 12,288‑card Ampere cluster; Keda Xunfei launched the "FeiXing One" ten‑thousand‑card platform in 2023; China Mobile’s Hohhot AI compute center hosts ~2,500 servers delivering 6.7 EFLOPS, with plans for two additional ultra‑large domestic clusters; China Telecom’s Tianyi Cloud Shanghai Lingang domestic ten‑thousand‑card pool is now operational.

Inference Demand as the Next Growth Curve

With large‑model applications moving to production, inference compute demand is exploding. Open‑source frameworks like Llama accelerate the deployment of generative AI across industries, driving a rapid increase in inference power needs.

Industry forecasts predict a 113% CAGR for global large‑model inference peak demand (2024‑2027), outpacing the 78% CAGR for training demand. IDC data shows the share of cloud inference versus training shifting from 41.5%/58.5% in 2022 to 62.2%/37.8% by 2026.

Inference workloads require cost‑effective, low‑latency, and highly stable compute. Strategies include using training‑card hardware for inference, deploying combined train‑inference machines, and dynamically adjusting resources through peak‑shaving techniques.

Regional Deployment and Cross‑Region Scheduling

China’s AI compute infrastructure is funded by local governments, central state‑owned enterprises, and AI cloud providers (Alibaba Cloud, Huawei Cloud, Baidu Cloud). By mid‑2024, 87 AI compute centers (built or under construction) have been recorded.

Public‑cloud AI services dominate the market, with Huawei Cloud, Baidu Cloud, Alibaba Cloud, and Tencent Cloud holding nearly 94% of the domestic AI public‑cloud share. These providers offer IaaS, PaaS, and TaaS services, enabling flexible, on‑demand AI compute.

Telecom operators excel in cross‑region scheduling. China Mobile’s "BaiChuan" platform aggregates ~5 EFLOPS of social compute and its own resources, achieving >10 EFLOPS total capacity and supporting over 10 million daily scheduling operations. China Unicom’s "XingLuo" platform provides one‑click "training‑then‑inference" services across 200+ backbone cloud pools, achieving millisecond‑level latency between key regions. China Telecom’s "XiRang" platform delivers >2,000 instances per second per cluster, integrating services such as "YunXiao" and "HuiJu" for unified AI compute delivery.

Key Takeaways

AI compute infrastructure requires coordinated advances in compute chips, heterogeneous servers, high‑performance networking, and storage.

Scaling laws dictate that model size growth directly drives exponential increases in training compute demand.

Massive ten‑thousand‑card clusters are becoming the de‑facto standard for large‑model training, with both international and domestic players investing heavily.

Inference workloads are set to become the dominant driver of compute demand, emphasizing cost efficiency and real‑time performance.

Regional AI compute centers and cross‑region scheduling platforms are essential for balancing supply and demand across China’s vast geography.

Source: China Academy of Information and Communications Technology (2024 "Intelligent Computing Infrastructure Development Research Report").

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large models scaling laws infrastructure Industry Trends AI compute Inference Demand Ten-Thousand-Card Clusters

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.