How Alibaba Detects ‘Disgusting’ Images on Taobao with AI

This article describes Alibaba's AI system for automatically filtering nauseating product images on Taobao, covering challenges such as cold‑start, class imbalance, and diverse visual features, and detailing solutions like semi‑supervised learning, active learning, OHEM‑cascade, attention mechanisms, and the resulting business impact.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba Detects ‘Disgusting’ Images on Taobao with AI

Abstract

Disgusting images—those that cause nausea—appear in three categories: animal‑related, human‑related, and object‑related. Detecting them among billions of product images faces three technical challenges: few initial samples (cold‑start), severe class imbalance, and highly diverse feature distribution.

Cold‑Start Solution

We expanded the training set by retrieving hundreds of disgusting images from Taobao’s content pool using a small‑sample retrieval platform and applied semi‑supervised learning with a lightweight MobileNet‑V2 backbone. The network output is modeled as Y = f_W(X), where W are parameters and X the input image. A mean‑teacher framework updates the student network while the teacher network tracks an exponential moving average of the student parameters.

Algorithm Iteration Process

Online data exhibits an extreme positive‑negative imbalance (far less than 0.1 % positive). To improve the model we combined active learning, noise‑sample identification, and online hard example mining (OHEM) with cascade training.

Active learning selects samples with confidence between two thresholds (images shown in the figure) and prioritises hard examples for annotation.

The loss‑prediction (LP) module, built on top of the target‑prediction (TP) module, predicts relative loss for each sample pair, guiding the selection of difficult examples.

OHEM + cascade replaces MobileNet‑V2 with DenseNet‑161 as the first stage and feeds its highest‑loss samples to a second ResNet‑50 classifier; inference combines both predictions.

Attention Mechanism

Since disgusting cues are often localized, we embed a CBAM‑style attention block (channel + spatial) into the backbone. Grad‑CAM visualisations confirm that the network focuses on the nauseating regions.

Business Impact

Deployed in Taobao’s “Guess You Like” (首猜) recommendation, the model scans the entire product pool, achieving 95 % precision and 94 % recall. It filtered millions of low‑quality images during the Double‑11 shopping festival, reduced manual review workload by ~70 %, and supported a multi‑task platform for broader image‑quality detection.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Image ClassificationAttention MechanismE-commerce AISemi-supervised Learningactive learning
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.