iQIYI NLP Team: Research Topics, Progress, and Applications in Video Services
The iQIYI NLP team applies lexical analysis, knowledge‑graph construction, tag recommendation, query understanding, voice‑assistant semantics, sentiment mining, and box‑office/view‑count prediction—leveraging weakly labeled data, CRF/CNN‑CRF models and deep learning—to enhance video comprehension, recommendation, search and commercial services across the platform.
Author: Moment – Ph.D. from Institute of Automation, Chinese Academy of Sciences, specializing in Natural Language Processing (NLP). Since 2016 he has worked at iQIYI Technology Product Center – Search Advertising, leading NLP and commercial system development. He has published over 20 papers at top conferences such as EMNLP, COLING, INTERSPEECH, ICASSP, etc.
Abstract: Natural Language Processing (NLP) is a key branch of artificial intelligence that studies how machines can understand and communicate using human language. At iQIYI, NLP aims to enable machines to better comprehend entertainment‑related video and text content, thereby providing intelligent services for users.
The paper introduces the iQIYI NLP team’s focus areas and recent achievements, illustrated with concrete use‑cases.
1. Lexical Analysis & Knowledge Graph
Lexical analysis (segmentation, POS tagging, word weighting, new‑word discovery, entity recognition/linking) serves as a foundation service for billions of requests. Entity recognition is a major challenge, especially for entertainment‑domain entities such as movie titles, game names, literary works, etc.
Data preparation: 1 M weakly‑labeled video sentences generated by heuristic rules and tens of thousands of precisely annotated sentences.
Modeling: Experiments with CRF, CNN, LSTM, and a two‑layer CNN+CRF model (shown in Figure 2) achieved the best performance, with F‑scores of 82.1 % and 72.6 % on two test sets for drama‑name recognition.
Applications: Entity recognition is used for content distribution in “bubble circles”, linking feed‑stream videos to movies, games, e‑commerce, comics, etc., enabling one‑click purchase or download.
2. Tag Recommendation
Tags are metadata extracted from video titles, descriptions, or content to improve personalized recommendation and content editing. Two types of tags are used: predefined type tags (e.g., “Entertainment”, “Star”, “Mainland”) and open‑domain content tags (e.g., “范爷”, “街拍”, “减肥”).
Tag generation pipeline: (1) heuristic candidate generation; (2) scoring via unsupervised methods (TextRank, ExpandRank) or supervised methods (Maui, CeKE); (3) final selection. The task is reformulated as a sequence labeling problem using a CRF model, which allows extraction of arbitrarily long tag phrases without a separate candidate module.
Tag services are already deployed in video recommendation, iQIYI Headlines, Bubble, and video editing.
3. Query Understanding
Includes personalized default search terms, query auto‑completion, query correction, and query classification. Personalized default terms are generated by matching user profiles with candidate queries, effectively acting as a recommendation system.
Query auto‑completion leverages token‑query similarity, click‑through rates, freshness, and other signals to suggest likely completions during user input.
4. Voice Assistant
The voice assistant is deployed on iQIYI VR headsets and the iQIYI app. It supports over 40 interaction types, such as video playback/search, weather queries, device settings, VIP purchase, game download, and direct episode or movie playback.
Architecture: three main modules – speech recognition, speech recognition correction, and semantic parsing. Semantic parsing consists of intent classification and slot filling. Improvements in lexical analysis (especially entertainment‑entity recognition) and query correction enhance robustness for entertainment and gaming domains.
5. Sentiment (Public Opinion) Analysis
Uses syntactic analysis to extract opinion targets, opinion words, and sentiment polarity from user‑generated content (comments, bullet screens, bubble circles). This provides multi‑dimensional insights for content operators and marketers. Example: analysis of movie “Wolf Warrior 2” across visual effects, scenes, and actors (Figure 7).
6. Box‑Office and Video‑View (VV) Prediction
Challenges include long lead times, many external factors, limited training samples (< 1000 titles), and data integration from multiple sources.
Features: > 100 dimensions covering time, genre, platform, index, IP, predecessor, trend, etc., with missing‑value imputation and feature transformations.
Models compared: linear models, SVM, Random Forest, GBDT, DNN, and stacking ensembles. The best model achieved an R² of 85 % on the latest 90 licensed dramas; for head‑tier dramas (VV > 1 billion) prediction error ≤ 30 % for 67 % of cases and ≤ 50 % for 100 % of cases (see Figure 8).
Conclusion
By leveraging weakly‑labeled and precisely annotated data together with machine‑learning and deep‑learning NLP techniques, iQIYI improves video understanding, search, recommendation, and data mining, delivering a smarter professional video experience. Future work includes exploring more effective deep models, multimodal fusion of text and images, and transfer learning to further boost system performance.
iQIYI Technical Product Team
The technical product team of iQIYI
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.