Artificial Intelligence 7 min read

Alibaba Cloud Breaks MLPerf Inference Performance Records with Zhenduan Heterogeneous Computing Platform

Alibaba Cloud's Zhenduan heterogeneous computing acceleration platform achieved historic breakthroughs in the MLPerf inference benchmark, processing over 1.07 million images per second on 8 NVIDIA A100 GPUs, setting multiple first‑place records and dramatically improving e‑commerce recommendation speed and overall AI workload efficiency.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Breaks MLPerf Inference Performance Records with Zhenduan Heterogeneous Computing Platform

Alibaba Cloud's Zhenduan heterogeneous computing acceleration platform achieved historic breakthroughs in the MLPerf inference benchmark, setting multiple first‑place records across various GPU configurations.

In the offline scenario of image classification, the platform processed 1.078 million images per second using eight NVIDIA A100 GPUs, surpassing the previous best of 1.039 million images per second achieved by 128 Google TPU v3 chips.

This performance boost enhances image‑recognition and autonomous‑driving applications; in Alibaba's e‑commerce scenario, product‑recommendation speed increased fivefold while server count was reduced by 75%, delivering a smoother shopping experience.

MLPerf is a leading AI benchmark, and image‑classification testing is a highly competitive crown‑jewel of the suite, attracting participation from major vendors, research institutions, and universities.

The Zhenduan platform supports diverse AI chips (GPU, ASIC), provides optimized compilation, and integrates with frameworks such as TensorFlow, Caffe, and PAI, delivering far‑greater efficiency than the standard ResNet‑50 v1.5 model.

Beyond the 8‑GPU A100 result, Alibaba Cloud also secured first place on single‑card tests: 136,142 images per second on an A100, 69,514 IPS on an A10, and 30,414 IPS on a T4, each markedly outperforming competing submissions.

The platform’s superior results stem from full‑stack hardware‑software optimization, including AutoML‑driven model design, Once‑For‑All neural‑architecture search with reinforcement‑learning, INT8 quantization, and deep retraining to preserve accuracy.

Large‑scale operator fusion and automatic tuning further boost GPU utilization, enabling a single A10 GPU to achieve over 50% of the performance of an A100, whereas TensorRT reaches only about one‑third.

Alibaba Cloud now offers these capabilities through the Elastic Accelerated Instance Service (EAIS), providing flexible, high‑performance deep‑learning compute for visual processing, e‑commerce, smart city, and other AI workloads.

GPU AccelerationAI inferenceAlibaba Cloudheterogeneous computingMLPerfdeep learning benchmark
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.