Kuaishou Content Cold-Start Recommendation: Challenges, Modeling Solutions, and Future Directions
This article presents Kuaishou's approach to solving the content cold-start problem by analyzing its impact on video growth, detailing the challenges of sparse and biased training data, and describing a suite of graph‑neural‑network, I2U/U2I, TDM, and debiasing techniques that improve early video exposure and long‑term ecosystem health.
Kuaishou generates millions of new videos daily, and the cold‑start problem aims to give these fresh items sufficient early exposure while maintaining overall traffic efficiency and user engagement.
Short‑term goals focus on delivering traffic to new videos; long‑term goals target a healthier content ecosystem by mitigating the Matthew effect and boosting DAU and watch time.
Key challenges include a sample space far smaller than the true solution space, extreme sparsity of cold‑start data leading to inaccurate learning, and difficulty modeling video growth value.
To address these, Kuaishou adopts several modeling strategies:
Graph Neural Networks (GNN) as a base model, incorporating user, author, and item nodes, with semantic self‑enhanced edges to reduce graph entropy and improve neighbor relevance.
I2U retrieval services that dynamically find interest groups for each video, converting item‑to‑user matches into item lists for recommendation.
Dual‑tower I2U models enhanced with self‑attention, action‑list features, and debiasing losses to mitigate user concentration and item‑id exposure bias.
TDM (Tree‑based Deep Model) with hierarchical retrieval, allowing richer user‑item interactions, DIN‑style mechanisms, and layered search to alleviate head‑user overload.
U2U interest expansion modules that quickly diffuse well‑performing cold‑start videos to similar users.
Heat‑bias disentanglement, generating separate embeddings for popularity and genuine interest, then fusing them online with biased and unbiased estimations.
Additional techniques include beta‑distribution modeling of PCTR uncertainty for exploration, dual‑domain transfer learning between hot and cold video domains, and various debiasing and data‑augmentation methods.
Future work will explore finer‑grained crowd diffusion models, causal debiasing for exposure and popularity, selective weighting of high‑heat samples, long‑term video value modeling across the cold‑grow‑stable‑decay lifecycle, and further data‑enhancement strategies.
The presentation concludes with a Q&A covering I2U indexing, preventing head‑content over‑concentration, and calculating the “优普率” metric for popular pool videos.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.