Artificial Intelligence 11 min read

Kunpeng: A Scalable Distributed Machine Learning Platform for Billion‑Scale Data

Kunpeng is a unique distributed platform that seamlessly integrates large‑scale system architecture with parallel optimization algorithms, delivering fault‑tolerant, high‑performance machine‑learning capabilities for billions of samples and features, and outperforming Spark, MPI, and XGBoost in real‑world Alibaba applications.

Alibaba Cloud Developer

Aug 15, 2017

Kunpeng: A Scalable Distributed Machine Learning Platform for Billion‑Scale Data

Research Background

In the era of big data, platforms handle massive volumes such as 500 million tweets per day, 50 million parcels, and 120,000 transactions per second. Alibaba’s internal models can reach billions of samples and hundreds of billions of features, creating a need for a distributed machine learning platform capable of training at this scale.

Existing Methods

Current distributed frameworks like Hadoop, Spark, GraphLab, and GraphX support parallel machine learning but struggle with large‑scale algorithms. Spark and Hadoop provide coarse‑grained operators, while GraphLab/GraphX focus on graph computation and lack suitability for general ML. MPI offers low‑level parallelism but lacks fault tolerance, especially with many workers.

Motivation and Innovation of Kunpeng

Named after the mythic fish and bird, Kunpeng combines a massive distributed computing system (“Kun”) with a large‑scale distributed optimization algorithm (“Peng”). This integration enables “one‑fly‑to‑the‑sky” performance.

System Innovations

Robust fault‑tolerance even in busy online clusters

Backup Instance for Straggler Management

Support for DAG‑based scheduling and synchronization (BSP/SSP/ASP)

User‑friendly interfaces and programming APIs

Algorithm Innovations

Kunpeng enables large‑scale implementations of many algorithms, including LR, FTRL, MART, FM, HashMF, DSSM, DNN, and LDA.

Kunpeng Architecture

The architecture builds on Alibaba’s internal Apasra platform and includes features such as Robust Failover, Backup Instance, and DGA for scheduling and synchronization.

Core modules:

Server nodes: shard model storage

Worker nodes: shard training data and compute

Coordinator: controls algorithm flow (initialization, iteration, termination)

ML Bridge: script‑based data preprocessing

PS‑Core: parameter server (servers, workers, coordinator)

Fuxi: monitors machine status and performs fault recovery

User Perspective

Users invoke algorithms with a few script commands, e.g.,

ps_train -i demo_batch_input -o demo_batch_result -a xxAlgo -t

, followed by termination and evaluation steps. The process is transparent and smooth.

Developer Perspective

Developers use simple APIs such as Worker.PullFrom(Server) and SyncBarrier() to handle communication and synchronization.

Experimental Results

Comparison with Spark and MPI

On seven datasets, Kunpeng’s logistic regression (LR) training time and memory consumption outperform Spark and MPI, especially on high‑dimensional data where Spark and MPI fail.

Kunpeng‑MART vs XGBoost

Kunpeng‑MART uses less memory than XGBoost across four datasets and achieves faster training times; XGBoost often fails on large datasets.

Impact of Worker Count

Increasing workers accelerates training and reduces per‑worker memory usage. With 25 workers, Kunpeng trains a sparse LR model on 70 billion features and 180 billion samples.

Conclusion

Kunpeng provides powerful distributed computation and algorithm optimization, handling billions of features, trillions of parameters, and delivering robustness, flexibility, scalability, and efficiency in production. It boosted CTR by 21% and GMV by 10% during Alibaba’s 2015 Double‑11 event, and improved fraud‑detection recall from 91% to 98% in Alipay.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Artificial Intelligence distributed machine learning Scalable Systems

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.