Artificial Intelligence 14 min read

Multimodal Person Identification: Techniques, Datasets, and Applications by iQIYI

In a CSDN Tech Open Class Plus talk, iQIYI’s Dr. Lu Xiangju detailed multimodal person‑identification techniques that combine face, voice, pose and clothing cues, introduced the massive iQIYI‑VID dataset for real and cartoon subjects, described semi‑supervised training with Unknown Identity Rejection loss, and explained how these advances power iQIYI video services.

iQIYI Technical Product Team

Dec 6, 2019

Multimodal Person Identification: Techniques, Datasets, and Applications by iQIYI

In a CSDN Tech Open Class Plus session, iQIYI scientist Dr. Lu Xiangju presented the latest multimodal person‑identification technologies and shared how these methods are applied in iQIYI video products.

Dr. Lu, head of the PersonAI team, focuses on person identification and AI. He organized the iQIYI Multimodal Video Person Identification Competition and released the world‑first large‑scale video person dataset iQIYI‑VID, which contains millions of real‑person entries and tens of thousands of cartoon characters.

1. Basics of multimodal technology – Person identification is far more than face recognition. In video scenarios (e.g., variety shows or action movies) a face may be occluded or absent, so additional cues such as body pose, clothing, voice, fingerprint, or iris are required to determine identity.

2. Virtual person identification – Virtual persons include cartoons, anime, and game characters. The rapid growth of such characters creates a strong demand for robust recognition methods, which the presented research addresses.

2.1 Real‑person recognition (IQFace) – The system combines face, voice, body pose, and clothing information. A custom distributed training framework was built to handle a face database of 5.5 M IDs (≈300 k celebrities). The model also predicts 27 facial attributes, including micro‑expressions, and can automatically generate micro‑expression packs for both real and cartoon faces.

The pipeline includes noise‑reduction, model quantization, pruning, and distillation to improve speed and resource usage.

2.2 Semi‑supervised learning with unlabeled data – An Unknown Identity Rejection (UIR) loss was designed for the open‑set nature of face recognition. Labeled samples are forced toward their class centroids, while unlabeled samples are pushed away from all centroids, effectively increasing inter‑class distance. The total loss is the sum of the standard labeled loss and the UIR loss.

Experimental results – Using the cleaned MS‑1M (MS1MV2) dataset (5 M images, 90 k identities) plus ~4.9 M unlabeled images, the method was evaluated on iQIYI‑VID, Trillion‑Pairs, and IJB‑C. Adding UIR loss consistently improved performance.

2.3 Virtual‑person recognition (iCartoonFace) – Approximately 40 k cartoon characters and 500 k training images were collected with 98 % annotation quality (bounding box, pose, gender, color, etc.). The model addresses the difficulty of distinguishing very similar characters and uses loss functions such as Softmax, SphereFace, CosFace, and ArcFace to tighten intra‑class distribution and enlarge inter‑class gaps. Fusion of real‑person and cartoon data further improves discrimination.

Multimodal database and algorithm – iQIYI has built the largest multimodal video person database with clear labels, covering face, head, body, and voice features. A multi‑model attention architecture fuses these four modalities for robust identification.

Applications in iQIYI products – Features such as “Only Look at Them” and the AI Radar on TV rely on the multimodal database and algorithms. The system balances modality weights, matches micro‑expressions and textual captions, and continuously refines the model based on real‑world usage.

Relevant papers and resources: • "Unknown Identity Rejection Loss: Utilizing Unlabeled Data for Face Recognition" (arXiv:1910.10896) • "iCartoonFace: A Benchmark of Cartoon Person Recognition" (arXiv:1907.13394) • "iQIYI‑VID: A Large Dataset for Multi‑modal Person Identification" (arXiv:1811.07548) Dataset download: http://challenge.ai.iqiyi.com/detail?raceId=5c767dc41a6fa0ccf53922e7

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI iQIYI multimodal recognition person identification

Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.