How Alibaba Detects ‘Disgusting’ Images on Taobao with AI
This article describes Alibaba's AI system for automatically filtering nauseating product images on Taobao, covering challenges such as cold‑start, class imbalance, and diverse visual features, and detailing solutions like semi‑supervised learning, active learning, OHEM‑cascade, attention mechanisms, and the resulting business impact.
Abstract
Disgusting images—those that cause nausea—appear in three categories: animal‑related, human‑related, and object‑related. Detecting them among billions of product images faces three technical challenges: few initial samples (cold‑start), severe class imbalance, and highly diverse feature distribution.
Cold‑Start Solution
We expanded the training set by retrieving hundreds of disgusting images from Taobao’s content pool using a small‑sample retrieval platform and applied semi‑supervised learning with a lightweight MobileNet‑V2 backbone. The network output is modeled as Y = f_W(X), where W are parameters and X the input image. A mean‑teacher framework updates the student network while the teacher network tracks an exponential moving average of the student parameters.
Algorithm Iteration Process
Online data exhibits an extreme positive‑negative imbalance (far less than 0.1 % positive). To improve the model we combined active learning, noise‑sample identification, and online hard example mining (OHEM) with cascade training.
Active learning selects samples with confidence between two thresholds (images shown in the figure) and prioritises hard examples for annotation.
The loss‑prediction (LP) module, built on top of the target‑prediction (TP) module, predicts relative loss for each sample pair, guiding the selection of difficult examples.
OHEM + cascade replaces MobileNet‑V2 with DenseNet‑161 as the first stage and feeds its highest‑loss samples to a second ResNet‑50 classifier; inference combines both predictions.
Attention Mechanism
Since disgusting cues are often localized, we embed a CBAM‑style attention block (channel + spatial) into the backbone. Grad‑CAM visualisations confirm that the network focuses on the nauseating regions.
Business Impact
Deployed in Taobao’s “Guess You Like” (首猜) recommendation, the model scans the entire product pool, achieving 95 % precision and 94 % recall. It filtered millions of low‑quality images during the Double‑11 shopping festival, reduced manual review workload by ~70 %, and supported a multi‑task platform for broader image‑quality detection.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
