Artificial Intelligence 20 min read

How Alibaba Cloud Powers AI with Cutting‑Edge Heterogeneous Compute

This article explains how Alibaba Cloud builds a high‑performance AI infrastructure by combining advanced hardware such as Shenlong servers, GPUs, FPGAs, NPUs, and custom interconnects like RDMA, together with virtualization, FPGA‑as‑a‑Service, AIACC, and resource‑pooling technologies to deliver scalable, cost‑effective AI services.

Alibaba Cloud Developer

Apr 28, 2020

How Alibaba Cloud Powers AI with Cutting‑Edge Heterogeneous Compute

Computing Technology

Under Alibaba Cloud's Shenlong hardware platform, the virtualization architecture has been upgraded to provide a lightweight VM that delivers near‑bare‑metal performance. The Shenlong server features hardware‑direct I/O, storage and network virtualization on MOC cards, and a trimmed‑down Linux OS with a lightweight hypervisor.

GPU

GPU remains the most mature and widely used AI accelerator. Alibaba has performed extensive compiler optimizations for GPU, offers cloud GPU products synchronized with the latest GPU generations, and leads in GPU virtualization, hot‑upgrade, and SR‑IOV based GPU migration.

Training chips continue to push performance with FP32 scaling, TensorCore precision tricks, NVLink high‑speed links, and GPUDirect RDMA. Alibaba Cloud also researches programmable topology splitting on NVSwitch for multi‑GPU communication.

FPGA

FPGA provides ASIC‑level performance with high flexibility but traditionally suffers from long development cycles and a steep learning curve. Alibaba Cloud and AIS co‑developed the industry‑first dual‑chip Xilinx FPGA board, achieving independent innovation at the board and HDK levels.

Shuntian Platform: FPGA as a Service (FaaS)

FaaS provides a unified hardware platform and middleware in the cloud, dramatically lowering accelerator development and deployment costs. Third‑party ISV IP can be offered as a service without users needing to understand the underlying hardware. The platform supplies two development kits, HDK and SDK.

AliDNN

AliDNN is a full‑stack deep‑learning acceleration engine built on FPGA, comprising a custom instruction set, accelerator, SDK, and compiler. It allows TensorFlow, Caffe and other frameworks to call the engine directly, compiling models into accelerator instructions for high‑throughput, low‑latency inference.

NPU

AliNPU (including the “HanGuang 800”) is optimized for CNN workloads while also supporting RNN models, delivering a cost‑performance ratio that surpasses competing AI inference chips.

Interconnect Technology

RDMA

RDMA is a high‑performance network technology that reduces data transfer time and CPU overhead, crucial for AI, scientific computing, and distributed storage. Alibaba’s HAIL network architecture, combined with custom switches, powers the world’s largest RDMA deployment across dozens of data centers, supporting services such as ESSD, SCC, PAI, and POLARDB.

EXSPARCL Communication Library

EXSPARCL (Extremely Scalable and high Performance Alibaba Group Communication Library) offers generic collective communication, compatible with NVIDIA NCCL, and is optimized for large‑scale AI clusters with multi‑NIC support, topology‑aware routing, and congestion‑free algorithms.

AIACC (AI Acceleration Tool)

AIACC provides a unified framework that accelerates TensorFlow, PyTorch, MXNet, and Caffe across VPC and RDMA networks, delivering 1‑10× training speedups without requiring code changes.

Heterogeneous GPU Container Support (cGPU)

The cGPU solution integrates with Kubernetes and NVIDIA Docker, offering transparent GPU sharing and isolation without recompiling applications or replacing CUDA libraries. It supports both memory and compute isolation on NVIDIA devices.

Adapts to open‑source Kubernetes and NVIDIA Docker standards

Transparent to users; no need to recompile or replace CUDA libraries

Provides stable low‑level operations for NVIDIA devices

Supports both memory and compute isolation

Software Pooling (EAIS.EI)

EAIS decouples CPU cores from heterogeneous accelerators via software pooling, allowing CPU‑only ECS instances to dynamically attach or detach GPUs, FPGAs, NPUs, etc., communicating over encrypted gRPC.

HARP: Runtime Layer Pooling

HARP virtualizes accelerator resources, initially focusing on GPUs, to enable transparent, dynamic allocation of local or remote accelerators across containers, VMs, or bare metal, improving utilization and supporting future expansion to NPUs and other chips.

Transparent to upper‑layer applications; works on physical machines, containers, and VMs

Supports both local and remote accelerators with no performance loss locally

Lightweight API for fine‑grained memory and compute control

Hides hardware details and automates configuration

Provides profiling and trace‑replay capabilities

Hardware Layer Pooling

Hardware pooling combines general‑purpose CPUs with various accelerators via high‑speed interconnects, enabling flexible accelerator composition, higher reliability, and better SLA guarantees.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Alibaba Cloud RDMA GPU virtualization heterogeneous computing AI hardware FPGA as a Service

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.