Advertising Data Characteristics and Sparse Large‑Model Practices at iQIYI
iQIYI’s ad ranking system replaces static, hash‑based embeddings with TFRA dynamic embeddings to efficiently handle massive sparse ID features, eliminates collisions and I/O bottlenecks, isolates memory during hot model swaps, enabling billion‑parameter models that boost revenue by 4.3 % while planning adaptive embedding sizes for future improvements.
This document outlines the specific characteristics of advertising data and the challenges and solutions encountered when building sparse large‑scale models for iQIYI's ad ranking system.
1. Advertising Data Characteristics
Advertising data is mainly composed of high‑dimensional ID features (user ID, ad ID, interaction sequence IDs) rather than continuous signals such as images or audio. These ID features are sparse and massive.
Continuous features include static attributes (e.g., user age) and dynamic behavior counts (e.g., clicks on a certain industry). They generalize well but lack memorization ability and often require extensive feature engineering.
Discrete (categorical) features are fine‑grained, enumerable (e.g., gender, industry ID) or high‑dimensional (e.g., user ID, ad ID). They have strong memorization and discrimination power, which is essential for personalized ad estimation. Typical encoding methods are One‑hot Encoding and Embedding. One‑hot suffers from the “dimension disaster” for massive IDs, so embedding is preferred.
2. iQIYI Ad Ranking Model Status
Since 2019 iQIYI migrated from online‑learning FM models to DNN models built on TensorFlow. All data are stored as dense tensors; embedding tables require a fixed shape [vocabulary_size, embedding_dimension] . High‑dimensional IDs are first hashed into a limited vocabulary, which leads to hash collisions and degraded offline performance.
Key problems when using high‑dimensional sparse IDs:
Feature collisions: Large vocabulary_size slows training and may cause OOM; small hash space leads to high collision rates and loss of feature information.
Inefficient I/O: Sparse ID updates affect only a tiny fraction of the dense embedding tensor, causing the whole tensor to be read/written each step, which is costly for large models.
3. Sparse Large‑Model Practice
In 2023 iQIYI adopted industry‑standard open‑source technologies for training and inference of sparse large models, selecting the TFRA (TensorFlow Recommenders Addons) dynamic embedding component because it:
Integrates seamlessly with the TensorFlow ecosystem, preserving existing optimizers and initializers.
Provides dynamic memory scaling, reducing resource consumption and avoiding hash collisions.
Using TensorFlow 2.6.0 and TFRA 0.6.0, the team performed three major upgrades:
Replaced static embedding with dynamic embedding, removing manual hash logic and ensuring loss‑less learning of all categorical features.
Re‑introduced high‑dimensional user ID and ad ID features, achieving positive offline and online gains.
Added composite sparse ID features (e.g., user ID + industry ID, user ID + app package) and leveraged feature admission to incorporate even rarer combinations.
4. Model Update and Hot‑Swap Issues
During sparse large‑model deployment, hot updates via TensorFlow Serving caused inference latency spikes due to memory allocation and release contention. Two memory‑intensive phases were identified:
Model restore/unload: allocation of variable tensors when loading a new model and deallocation when unloading the old one.
RPC inference: temporary tensors allocated for forward computation and released after the request.
When a new model is restored while RPC requests continue, the overlapping allocations lead to “memory thrashing” and latency spikes. The solution was to isolate memory pools for model parameters and RPC tensors, allocating them in separate address spaces, which eliminated the hot‑update latency jitter.
Additional optimizations included model file sharding and intra‑datacenter P2P transfer to reduce storage and network pressure during frequent updates.
5. Overall Benefits
The platform now supports training, inference, and deployment of billion‑parameter models with stable latency. Three sparse large models have been fully launched in CVR and DCVR scenarios, delivering a 4.3 % revenue uplift for effect‑based advertising.
6. Future Outlook
Current practice assigns a uniform embedding dimension to all values of a feature, which is sub‑optimal for highly skewed ID distributions. Future work will explore adaptive embedding dimensions to balance over‑fitting of low‑frequency IDs and under‑fitting of high‑frequency IDs.
Another direction is incremental model export, loading only the changed parameters into TensorFlow Serving to achieve minute‑level update cycles and improve real‑time performance.
iQIYI Technical Product Team
The technical product team of iQIYI
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.