DeepRec-Based High-Dimensional Sparse Feature Support and Real-Time Model Training in Ximalaya AI Cloud
Ximalaya AI Cloud leverages DeepRec’s Embedding Variable to elastically manage high‑dimensional sparse features with low collision, supporting admission/eviction, multi‑level storage and minute‑level incremental model updates, which together boost GPU utilization, halve training time and improve recommendation CTR by 2‑3 % while maintaining latency.
Ximalaya's app provides recommendation scenarios such as Daily Must-Listen, Today’s Hotspot, Private FM, Guess You Like, VIP feed, and Discovery page.
The Ximalaya AI Cloud is an end‑to‑end algorithm platform that offers data, user profiles, features, models, components, and applications in a unified environment.
It delivers data, profile, feature, model, component, and application management capabilities. Through a visual modeling interface and component‑based design, users can drag‑and‑drop to build a complete pipeline: data → features → samples → model → service. The platform also supports rich parameterization of features and models, allowing most configuration to be done via UI without code changes, thus reducing development cost and improving algorithm efficiency.
A typical deep‑model training DAG is shown below:
The platform currently supports major business scenarios of Ximalaya apps, including recommendation, advertising, search recommendation, as well as customized development such as user profiling, data analysis, and BI data generation.
Cooperation background : As algorithm capabilities grew and the search‑wide‑push business expanded, the recommendation stack migrated from traditional machine learning to deep learning, demanding larger sample sizes, higher feature dimensions, and more complex models. The stack uses Spark for data processing (Parquet storage), Kubernetes for GPU scheduling, and TensorFlow for model training. Two main pain points were identified:
High‑dimensional sparse feature support : Hash collisions exceed 20% when mapping IDs to a ten‑million‑slot space; reducing collisions to <5% requires five‑fold space expansion, negating compression benefits. A custom multi‑hash scheme reduced collision to 0.2‰ but increased sequence length three‑fold, hurting inference performance.
Feature admission/eviction and variable length handling : Proper configuration enables reduction of model size for billion‑plus feature dimensions and stabilizes metrics.
DeepRec addresses these issues with an Embedding Variable that uses a dynamic hash‑map‑like structure, allowing elastic growth of sparse parameters, eliminating most collisions, and saving memory. It supports feature admission (入场), eviction (退场), and mixed‑media storage to scale feature size while lowering storage cost.
For high‑dimensional sparse ID features (e.g., user ID, item ID), DeepRec’s EmbeddingVariable (EV) op tracks each ID’s update frequency and allows per‑feature configuration to balance collision rate and parameter count. Users can enable EV for high‑dimensional features and keep it disabled for dense features.
Feature admission and eviction are implemented via a counter‑based admission policy and a global‑step‑based eviction policy. In practice, all features start with EV disabled; high‑dimensional sparse features enable EV, admission thresholds are set high for initial training, and eviction thresholds are tuned based on EV analysis tools.
Unadmitted features share the same initialization logic as admitted ones: a small embedding table is created at training start, IDs not yet admitted return random values during training and a default value (0) during serving. Both admission and eviction settings are configurable.
DeepRec provides an EV analysis component that reports feature name, ID, vector, update frequency, and recent step after each training run, helping engineers fine‑tune parameters and data pipelines.
The platform also offers multi‑level storage for embedding parameters (HBM, DRAM, SSD). A cache policy automatically keeps hot features in fast storage while offloading cold features to slower, larger storage, reducing memory pressure without sacrificing training speed.
Real‑time training : DeepRec can export only the incremental changes of a model as a small checkpoint (KB size), enabling minute‑level model updates online. A re‑engineered stream‑batch unified training flow achieved 10‑minute iteration cycles, as illustrated below:
Online inference : The provided libserving_processor.so library supports model auto‑recognition, incremental updates, and SessionGroup. Additionally, Alibaba Cloud PAI‑EAS offers a managed online inference service with load testing, auto‑scaling, debugging, performance monitoring, and logging.
Overall benefits : After pipeline refactoring, GPU utilization on a single worker exceeds 40% and training time drops by over 50%. In a major recommendation scenario, CTR and PTR improve by 2‑3% while latency and timeout rates remain stable. Adding high‑dimensional ID and cross features yields another 2‑3% gain.
Future plans include:
SessionGroup for shared‑memory multi‑tenant inference, improving node resource utilization and QPS.
Model compression and quantization to remove EV metadata in production and reduce model size.
Multi‑model and GPU inference leveraging DeepRec’s CUDA Multi‑Stream and CUDA Graph capabilities.
Ximalaya Technology Team
Official account of Ximalaya's technology team, sharing distilled technical experience and insights to grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.