Artificial Intelligence 9 min read

How HybridBackend Supercharged Ximalaya’s Recommendation Engine with GPU Acceleration

This article details how Ximalaya’s AI Cloud adopted the open‑source HybridBackend framework to overcome sparse data access and distributed training bottlenecks, achieving multi‑GPU utilization gains, faster model training, and significant cost reductions across its recommendation services.

Alibaba Cloud Big Data AI Platform

Mar 20, 2023

How HybridBackend Supercharged Ximalaya’s Recommendation Engine with GPU Acceleration

01 Business Introduction

Recommendation scenarios are a core application of the Ximalaya app, covering Hot, "You May Like", Private FM, homepage feeds, discovery page recommendations, and Daily Must‑Listen modules. All these rely on Ximalaya AI Cloud, an end‑to‑end algorithm platform spanning data, features, models, and services.

02 Problem and Challenges

Transitioning training hardware from CPU to GPU revealed severe under‑utilization of compute resources. Two main causes were identified:

Sparse data access: Using libsvm‑style sparse strings stored in OSS caused large network‑bound reads and high CPU overhead for string parsing and embedding lookup.

Distributed training: Early attempts with Keras+Horovod suffered from unstable acceleration and degraded metrics. A custom parameter‑server framework improved efficiency but introduced IO bottlenecks and high maintenance cost for variable‑length embeddings.

03 HybridBackend

HybridBackend, an open‑source framework promoted by Alibaba Cloud, optimizes sparse model training, data access, and distributed training. It supports TensorFlow, DeepRec, and other frameworks, and its design was published at ICDE 2022. The framework is available on GitHub and can be installed via pip.

04 Sparse Data Access Optimization

HybridBackend provides the hb.data.Dataset interface, which supports columnar formats such as Parquet. Benchmarks show Parquet reads are dramatically faster than CSV or TensorFlow I/O. For example, reading a 3.3 GB Parquet file with HybridBackend on a single thread takes ~398 ms, while the same file with TensorFlow I/O exceeds 100 s.

Key features include:

Selective column parsing: Only required fields are read and automatically converted to SparseTensor or padded tensors.

Parallelism controls: num_parallel_reads and num_parallel_parser_calls let users tune file‑level and column‑level parallelism, fully utilizing CPU resources.

After adopting HybridBackend, GPU utilization for single‑card training increased by more than 3×, and training cycles shortened significantly.

05 Distributed Training Optimization

HybridBackend introduces a hybrid parallel training mode where each GPU holds all dense parameters and a subset of sparse parameters, communicating via NCCL over NVLink instead of traditional RPC‑based parameter servers.

This approach improved training speed and GPU utilization, and integration with the Keras Model API enabled features such as model hot‑restart, reducing operational costs.

06 Overall Benefits

Post‑refactor, multi‑GPU training on a single machine achieved average GPU utilization gains of over 1.4× and reduced overall training time by more than 50 %. The solution has been rolled out to all TensorFlow and DeepRec models.

07 Future Plans

Ximalaya AI Cloud will continue to expand HybridBackend usage across recommendation, advertising, and search scenarios, and explore collaborations for:

Operator optimizations to fuse embedding lookup kernels for faster inference.

PyTorch support for NLP‑driven recommendation pipelines.

Scaling to training jobs with billions of samples and features.

08 Acknowledgements

Special thanks to HybridBackend community members Chen Langshi and Yuan Man for their technical support, which accelerated the optimization of our deep‑learning training workflow.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation system GPU Acceleration distributed training HybridBackend Sparse Data

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.