Artificial Intelligence 10 min read

Advertising Data Characteristics and Sparse Large‑Model Practices at iQIYI

iQIYI’s ad ranking system replaces static, hash‑based embeddings with TFRA dynamic embeddings to efficiently handle massive sparse ID features, eliminates collisions and I/O bottlenecks, isolates memory during hot model swaps, enabling billion‑parameter models that boost revenue by 4.3 % while planning adaptive embedding sizes for future improvements.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
Advertising Data Characteristics and Sparse Large‑Model Practices at iQIYI

This document outlines the specific characteristics of advertising data and the challenges and solutions encountered when building sparse large‑scale models for iQIYI's ad ranking system.

1. Advertising Data Characteristics

Advertising data is mainly composed of high‑dimensional ID features (user ID, ad ID, interaction sequence IDs) rather than continuous signals such as images or audio. These ID features are sparse and massive.

Continuous features include static attributes (e.g., user age) and dynamic behavior counts (e.g., clicks on a certain industry). They generalize well but lack memorization ability and often require extensive feature engineering.

Discrete (categorical) features are fine‑grained, enumerable (e.g., gender, industry ID) or high‑dimensional (e.g., user ID, ad ID). They have strong memorization and discrimination power, which is essential for personalized ad estimation. Typical encoding methods are One‑hot Encoding and Embedding. One‑hot suffers from the “dimension disaster” for massive IDs, so embedding is preferred.

2. iQIYI Ad Ranking Model Status

Since 2019 iQIYI migrated from online‑learning FM models to DNN models built on TensorFlow. All data are stored as dense tensors; embedding tables require a fixed shape [vocabulary_size, embedding_dimension] . High‑dimensional IDs are first hashed into a limited vocabulary, which leads to hash collisions and degraded offline performance.

Key problems when using high‑dimensional sparse IDs:

Feature collisions: Large vocabulary_size slows training and may cause OOM; small hash space leads to high collision rates and loss of feature information.

Inefficient I/O: Sparse ID updates affect only a tiny fraction of the dense embedding tensor, causing the whole tensor to be read/written each step, which is costly for large models.

3. Sparse Large‑Model Practice

In 2023 iQIYI adopted industry‑standard open‑source technologies for training and inference of sparse large models, selecting the TFRA (TensorFlow Recommenders Addons) dynamic embedding component because it:

Integrates seamlessly with the TensorFlow ecosystem, preserving existing optimizers and initializers.

Provides dynamic memory scaling, reducing resource consumption and avoiding hash collisions.

Using TensorFlow 2.6.0 and TFRA 0.6.0, the team performed three major upgrades:

Replaced static embedding with dynamic embedding, removing manual hash logic and ensuring loss‑less learning of all categorical features.

Re‑introduced high‑dimensional user ID and ad ID features, achieving positive offline and online gains.

Added composite sparse ID features (e.g., user ID + industry ID, user ID + app package) and leveraged feature admission to incorporate even rarer combinations.

4. Model Update and Hot‑Swap Issues

During sparse large‑model deployment, hot updates via TensorFlow Serving caused inference latency spikes due to memory allocation and release contention. Two memory‑intensive phases were identified:

Model restore/unload: allocation of variable tensors when loading a new model and deallocation when unloading the old one.

RPC inference: temporary tensors allocated for forward computation and released after the request.

When a new model is restored while RPC requests continue, the overlapping allocations lead to “memory thrashing” and latency spikes. The solution was to isolate memory pools for model parameters and RPC tensors, allocating them in separate address spaces, which eliminated the hot‑update latency jitter.

Additional optimizations included model file sharding and intra‑datacenter P2P transfer to reduce storage and network pressure during frequent updates.

5. Overall Benefits

The platform now supports training, inference, and deployment of billion‑parameter models with stable latency. Three sparse large models have been fully launched in CVR and DCVR scenarios, delivering a 4.3 % revenue uplift for effect‑based advertising.

6. Future Outlook

Current practice assigns a uniform embedding dimension to all values of a feature, which is sub‑optimal for highly skewed ID distributions. Future work will explore adaptive embedding dimensions to balance over‑fitting of low‑frequency IDs and under‑fitting of high‑frequency IDs.

Another direction is incremental model export, loading only the changed parameters into TensorFlow Serving to achieve minute‑level update cycles and improve real‑time performance.

advertisingTensorFlowAI recommendationDynamic EmbeddingLarge-Scale Modelssparse embedding
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.