Vivo’s DeepRec: Dynamic Embedding and GPU Tricks that Raised CTR by 1.2%
Vivo’s AI recommendation team leveraged Alibaba’s DeepRec engine—introducing dynamic Embedding Variables, feature admission/elimination, Parquet datasets, and advanced CPU/GPU inference optimizations such as SessionGroup, device placement, multi‑stream and BladeDISC compilation—resulting in notable gains in model accuracy, latency reduction, and resource efficiency.
Background
Vivo AI recommendation team is exploring large‑scale sparse algorithm training frameworks for search, advertising, and recommendation. Existing frameworks such as TensorNet, XDL, and tfra extend TensorFlow’s distributed and sparse capabilities but still lack generality, ease of use, and certain features.
DeepRec
DeepRec, provided by Alibaba Group, is a training/prediction engine for search, recommendation, and advertising scenarios. It deeply optimizes sparse models in distributed execution, graph optimization, operators, and runtime, offering rich high‑dimensional sparse feature support and delivering better business outcomes with clear performance gains.
Business Introduction
Vivo’s recommendation algorithm group covers information flow, video, music, advertising, and other search/advertising/recommendation services, essentially spanning all business types within the company.
Sparse Model Training
3.1 Sparse Features
3.1.1 Pain Points
TensorFlow’s native Embedding Layer suffers from static OOV issues, hash collisions, memory waste, and low‑frequency feature redundancy, making it unfriendly for large‑scale sparse scenarios.
3.1.2 Embedding Variable
DeepRec’s EmbeddingVariable uses a hash table as internal storage, dynamically creating and releasing embedding vectors, supporting forward lookup and backward updates, thus solving the OOV and hash‑collision problems.
3.1.3 Feature Admission/Elimination
DeepRec provides BloomFilter and Counter based admission to prevent rapid growth of sparse features, and two eviction strategies—global‑step based and L2‑weight based—to remove ineffective features.
3.1.4 Benefits
Replacing static embedding with dynamic embedding improves offline AUC by 0.5%, online CTR by 1.2%, and reduces model size by 20%.
Using ID features with dynamic embedding raises offline AUC by 0.4% and online CTR by 1%; adding global‑step eviction adds another 0.2% AUC and 0.5% CTR improvement.
I/O Optimization
3.2 Parquet Dataset
Parquet is a columnar storage format that saves space and speeds up data reads. DeepRec’s Parquet Dataset reads Parquet files out‑of‑the‑box, requiring no extra dependencies.
Vivo’s internal tests showed a 30% training speed increase, 38% storage cost reduction, and easier data analysis with Hive queries.
High‑Performance Inference Framework
4.1 CPU Inference Optimization
SessionGroup allows a configurable group of Sessions sharing variables while keeping private thread pools, achieving up to 80% QPS increase and 75% CPU utilization improvement.
4.2 GPU Inference Optimization
4.2.1 Device Placement
Placing the Embedding Layer on CPU eliminates costly GPU‑CPU data transfers, reducing P99 latency by 35%.
4.2.2 CUDA Multi‑Stream
Multi‑stream and MergeStream reduce kernel launch overhead and allow compute and copy to share streams, lowering P99 latency by 18% (multi‑stream) and an additional 11% (merge‑stream).
4.2.3 BladeDISC Compilation
BladeDISC fuses memory‑intensive operators and optimizes GPU memory hierarchy, cutting P99 latency by 21% and overall GPU latency by 50% while keeping GPU utilization above 60%.
Future Plans
Vivo intends to shift from asynchronous CPU training to synchronous GPU training using SparseOperationKit and HybridBackend to accelerate complex model training.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
