Vivo’s DeepRec: Dynamic Embedding and GPU Tricks that Raised CTR by 1.2%

Vivo’s AI recommendation team leveraged Alibaba’s DeepRec engine—introducing dynamic Embedding Variables, feature admission/elimination, Parquet datasets, and advanced CPU/GPU inference optimizations such as SessionGroup, device placement, multi‑stream and BladeDISC compilation—resulting in notable gains in model accuracy, latency reduction, and resource efficiency.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Vivo’s DeepRec: Dynamic Embedding and GPU Tricks that Raised CTR by 1.2%

Background

Vivo AI recommendation team is exploring large‑scale sparse algorithm training frameworks for search, advertising, and recommendation. Existing frameworks such as TensorNet, XDL, and tfra extend TensorFlow’s distributed and sparse capabilities but still lack generality, ease of use, and certain features.

DeepRec

DeepRec, provided by Alibaba Group, is a training/prediction engine for search, recommendation, and advertising scenarios. It deeply optimizes sparse models in distributed execution, graph optimization, operators, and runtime, offering rich high‑dimensional sparse feature support and delivering better business outcomes with clear performance gains.

Business Introduction

Vivo’s recommendation algorithm group covers information flow, video, music, advertising, and other search/advertising/recommendation services, essentially spanning all business types within the company.

Sparse Model Training

3.1 Sparse Features

3.1.1 Pain Points

TensorFlow’s native Embedding Layer suffers from static OOV issues, hash collisions, memory waste, and low‑frequency feature redundancy, making it unfriendly for large‑scale sparse scenarios.

3.1.2 Embedding Variable

DeepRec’s EmbeddingVariable uses a hash table as internal storage, dynamically creating and releasing embedding vectors, supporting forward lookup and backward updates, thus solving the OOV and hash‑collision problems.

3.1.3 Feature Admission/Elimination

DeepRec provides BloomFilter and Counter based admission to prevent rapid growth of sparse features, and two eviction strategies—global‑step based and L2‑weight based—to remove ineffective features.

3.1.4 Benefits

Replacing static embedding with dynamic embedding improves offline AUC by 0.5%, online CTR by 1.2%, and reduces model size by 20%.

Using ID features with dynamic embedding raises offline AUC by 0.4% and online CTR by 1%; adding global‑step eviction adds another 0.2% AUC and 0.5% CTR improvement.

I/O Optimization

3.2 Parquet Dataset

Parquet is a columnar storage format that saves space and speeds up data reads. DeepRec’s Parquet Dataset reads Parquet files out‑of‑the‑box, requiring no extra dependencies.

Vivo’s internal tests showed a 30% training speed increase, 38% storage cost reduction, and easier data analysis with Hive queries.

High‑Performance Inference Framework

4.1 CPU Inference Optimization

SessionGroup allows a configurable group of Sessions sharing variables while keeping private thread pools, achieving up to 80% QPS increase and 75% CPU utilization improvement.

4.2 GPU Inference Optimization

4.2.1 Device Placement

Placing the Embedding Layer on CPU eliminates costly GPU‑CPU data transfers, reducing P99 latency by 35%.

4.2.2 CUDA Multi‑Stream

Multi‑stream and MergeStream reduce kernel launch overhead and allow compute and copy to share streams, lowering P99 latency by 18% (multi‑stream) and an additional 11% (merge‑stream).

4.2.3 BladeDISC Compilation

BladeDISC fuses memory‑intensive operators and optimizes GPU memory hierarchy, cutting P99 latency by 21% and overall GPU latency by 50% while keeping GPU utilization above 60%.

Future Plans

Vivo intends to shift from asynchronous CPU training to synchronous GPU training using SparseOperationKit and HybridBackend to accelerate complex model training.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

GPU inferenceRecommendation Systemsdynamic embeddingDeepRecsparse features
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.