Artificial Intelligence 13 min read

iQIYI NLP Team: Research Topics, Progress, and Applications in Video Services

The iQIYI NLP team applies lexical analysis, knowledge‑graph construction, tag recommendation, query understanding, voice‑assistant semantics, sentiment mining, and box‑office/view‑count prediction—leveraging weakly labeled data, CRF/CNN‑CRF models and deep learning—to enhance video comprehension, recommendation, search and commercial services across the platform.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI NLP Team: Research Topics, Progress, and Applications in Video Services

Author: Moment – Ph.D. from Institute of Automation, Chinese Academy of Sciences, specializing in Natural Language Processing (NLP). Since 2016 he has worked at iQIYI Technology Product Center – Search Advertising, leading NLP and commercial system development. He has published over 20 papers at top conferences such as EMNLP, COLING, INTERSPEECH, ICASSP, etc.

Abstract: Natural Language Processing (NLP) is a key branch of artificial intelligence that studies how machines can understand and communicate using human language. At iQIYI, NLP aims to enable machines to better comprehend entertainment‑related video and text content, thereby providing intelligent services for users.

The paper introduces the iQIYI NLP team’s focus areas and recent achievements, illustrated with concrete use‑cases.

1. Lexical Analysis & Knowledge Graph

Lexical analysis (segmentation, POS tagging, word weighting, new‑word discovery, entity recognition/linking) serves as a foundation service for billions of requests. Entity recognition is a major challenge, especially for entertainment‑domain entities such as movie titles, game names, literary works, etc.

Data preparation: 1 M weakly‑labeled video sentences generated by heuristic rules and tens of thousands of precisely annotated sentences.

Modeling: Experiments with CRF, CNN, LSTM, and a two‑layer CNN+CRF model (shown in Figure 2) achieved the best performance, with F‑scores of 82.1 % and 72.6 % on two test sets for drama‑name recognition.

Applications: Entity recognition is used for content distribution in “bubble circles”, linking feed‑stream videos to movies, games, e‑commerce, comics, etc., enabling one‑click purchase or download.

2. Tag Recommendation

Tags are metadata extracted from video titles, descriptions, or content to improve personalized recommendation and content editing. Two types of tags are used: predefined type tags (e.g., “Entertainment”, “Star”, “Mainland”) and open‑domain content tags (e.g., “范爷”, “街拍”, “减肥”).

Tag generation pipeline: (1) heuristic candidate generation; (2) scoring via unsupervised methods (TextRank, ExpandRank) or supervised methods (Maui, CeKE); (3) final selection. The task is reformulated as a sequence labeling problem using a CRF model, which allows extraction of arbitrarily long tag phrases without a separate candidate module.

Tag services are already deployed in video recommendation, iQIYI Headlines, Bubble, and video editing.

3. Query Understanding

Includes personalized default search terms, query auto‑completion, query correction, and query classification. Personalized default terms are generated by matching user profiles with candidate queries, effectively acting as a recommendation system.

Query auto‑completion leverages token‑query similarity, click‑through rates, freshness, and other signals to suggest likely completions during user input.

4. Voice Assistant

The voice assistant is deployed on iQIYI VR headsets and the iQIYI app. It supports over 40 interaction types, such as video playback/search, weather queries, device settings, VIP purchase, game download, and direct episode or movie playback.

Architecture: three main modules – speech recognition, speech recognition correction, and semantic parsing. Semantic parsing consists of intent classification and slot filling. Improvements in lexical analysis (especially entertainment‑entity recognition) and query correction enhance robustness for entertainment and gaming domains.

5. Sentiment (Public Opinion) Analysis

Uses syntactic analysis to extract opinion targets, opinion words, and sentiment polarity from user‑generated content (comments, bullet screens, bubble circles). This provides multi‑dimensional insights for content operators and marketers. Example: analysis of movie “Wolf Warrior 2” across visual effects, scenes, and actors (Figure 7).

6. Box‑Office and Video‑View (VV) Prediction

Challenges include long lead times, many external factors, limited training samples (< 1000 titles), and data integration from multiple sources.

Features: > 100 dimensions covering time, genre, platform, index, IP, predecessor, trend, etc., with missing‑value imputation and feature transformations.

Models compared: linear models, SVM, Random Forest, GBDT, DNN, and stacking ensembles. The best model achieved an R² of 85 % on the latest 90 licensed dramas; for head‑tier dramas (VV > 1 billion) prediction error ≤ 30 % for 67 % of cases and ≤ 50 % for 100 % of cases (see Figure 8).

Conclusion

By leveraging weakly‑labeled and precisely annotated data together with machine‑learning and deep‑learning NLP techniques, iQIYI improves video understanding, search, recommendation, and data mining, delivering a smarter professional video experience. Future work includes exploring more effective deep models, multimodal fusion of text and images, and transfer learning to further boost system performance.

machine learningrecommendation systemsNLPvideo analyticsEntity RecognitionSpeech Assistant
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.