Artificial Intelligence 9 min read

UIR Loss: Leveraging Unlabeled Data to Enhance Face Recognition

iQIYI introduces a semi‑supervised Unknown Identity Rejection (UIR) loss that uses massive unlabeled face images to push unknown samples away from known class centers, improving open‑set face‑recognition accuracy, feature sparsity, and out‑of‑library rejection rates across multiple benchmarks and products.

iQIYI Technical Product Team

Oct 31, 2019

UIR Loss: Leveraging Unlabeled Data to Enhance Face Recognition

iQIYI possesses a massive amount of high‑quality video resources. Structured analysis of these videos, especially identifying the people appearing in them, is crucial. Current iQIYI products such as “AI Radar” and “Only Watch TA” rely on face‑recognition technology to detect characters in video frames and across entire videos.

Training a high‑performance face‑recognition model with supervised learning requires a large amount of labeled face data. Public datasets like MS‑Celeb‑1M contain about 1 million identities and 10 million images, while iQIYI‑VID provides 640 k video clips covering roughly 10 k identities, including about 6 million face images (iQIYI‑VID‑FACE).

Collecting multiple images for each person is labor‑intensive, limiting model performance. Moreover, face recognition is an open‑set problem: the labeled identities represent only a tiny fraction of the billions of people worldwide, so models may suffer from poor generalization.

To address these issues, iQIYI’s technical team proposes a semi‑supervised loss called Unknown Identity Rejection (UIR) Loss. In the open‑set setting, identities are divided into labeled set S and unlabeled set U with S ∩ U = ∅. For labeled samples, features are encouraged to approach their class‑center vectors; for unlabeled samples, the model must “reject” them by pushing their features far from all class centers.

In a CNN classifier, the softmax output yields probabilities p₁, p₂, …, pₙ for the known classes. Since an unlabeled identity does not belong to any class, ideally all pᵢ should be small. By setting a threshold, the unlabeled samples can be filtered, improving the out‑of‑library rejection rate. This leads to a multi‑objective minimization problem, which can be transformed into the following UIR loss (see image):

The total loss of the model is the sum of the conventional labeled loss and the UIR loss for unlabeled data:

The overall architecture feeds both labeled and unlabeled data into a backbone network to obtain features, followed by a fully‑connected layer that produces class probabilities. The labeled loss and UIR loss are computed separately based on these probabilities.

**Experimental Results**

We used the cleaned MS1MV2 dataset (≈5 M images of 90 k identities) as labeled data and collected about 4.9 M unlabeled images from the web with low overlap to the labeled set. The method was evaluated on three test sets: iQIYI‑VID, Trillion‑Pairs, and IJB‑C, using four backbone networks. Adding the UIR loss consistently improved performance. (Detailed numbers for ResNet‑100 on IJB‑C are omitted for brevity.)

**Further Analysis**

1. **UIR loss makes feature distribution sparser** – We measured the cosine distance between class‑center vectors. Larger distances indicate sparser distributions. The average distance increased with stronger backbones, and the “ours” method consistently achieved larger distances than the baseline.

2. **UIR loss improves out‑of‑library rejection rate** – On a new set of unlabeled data, the maximum probability output by the model decreased, indicating better rejection of unknown identities.

**Conclusion and Outlook**

The semi‑supervised UIR loss effectively leverages massive unlabeled face data to improve face‑recognition performance and generalization. The technique has already been deployed in several iQIYI products, such as “Only Watch TA”, AI Radar, and the “艺汇” app, enhancing user experience and production efficiency. Future work includes addressing extreme cases (blurred faces, heavy occlusion, profile views, or back‑side views) by incorporating multimodal cues, and further exploiting the large‑scale multimodal iQIYI‑VID dataset.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

face recognition Semi-supervised Learning UIR loss unlabeled data

Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.