Artificial Intelligence 15 min read

How a Simple Learning‑Rate Trick Detects 90% of Noisy Labels in Image Data

Training deep neural networks on large‑scale weakly labeled image data suffers from noisy annotations that degrade performance, but a simple algorithm that adjusts the learning‑rate during training can automatically identify up to 90% of noisy samples, improving dataset cleanliness and model accuracy without manual intervention.

Alibaba Cloud Developer

Sep 12, 2019

How a Simple Learning‑Rate Trick Detects 90% of Noisy Labels in Image Data

Background

Obtaining high‑confidence annotations for massive datasets is a major challenge for supervised learning; noisy labels in the training set can severely reduce model accuracy. A simple, efficient noisy‑label detection algorithm proposed by the Alibaba Taobao technology team can reveal about 90% of noisy labels simply by adjusting the learning rate during training.

Solution Approach

We surveyed state‑of‑the‑art papers on noisy‑sample detection and robust training, including Influence Functions, CurriculumNet, and MentorNet. These works inspire a strategy that leverages the loss distribution of samples across different training phases to identify likely noisy instances.

Algorithm Design

The algorithm consists of three stages:

Stage 1: Train a model to convergence with a fixed learning rate, allowing the model to overfit.

Stage 2: Apply a cyclic learning‑rate schedule that repeatedly pushes the model between under‑fitting and over‑fitting. During under‑fitting, noisy samples exhibit high loss, while clean samples have low loss; the opposite occurs during over‑fitting. By aggregating the mean and variance of each sample’s loss across cycles, samples with large statistics are flagged as noisy.

Stage 3: Remove the identified noisy samples and retrain the model on the cleaned dataset.

Algorithm Performance

Extensive experiments on datasets built from noisy‑label collections show that our method outperforms several recent approaches (e.g., Influence Functions, CurriculumNet, MentorNet) in both noisy‑label detection precision and downstream model accuracy. The following figures illustrate loss curves under cyclic learning rates and comparative performance tables.

Application Scenario – Image Quality Service Platform (Waterdrop)

The noisy‑sample detection algorithm dramatically reduces manual labeling effort and improves the quality of image‑based services such as content‑library cover‑image moderation, multi‑object detection, and inappropriate content filtering. Deployed on the Waterdrop platform, it processes over 4 billion images weekly with >90% filtering precision, supporting Alibaba’s e‑commerce visual assets.

References

Classification in the Presence of Label Noise: A Survey

Co‑teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels

Mentor‑Net: Learning Data‑Driven Curriculum for Very Deep Neural Networks on Corrupted Labels

Understanding Black‑box Predictions via Influence Functions

Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach

Learning from Massive Noisy Labeled Data for Image Classification

A Closer Look at Memorization in Deep Networks

Training Deep Neural Network Using a Noise Adaptation Layer

CurriculumNet: Weakly Supervised Learning from Large‑Scale Web Images

CleanNet: Transfer Learning for Scalable Image Classifier Training With Label Noise

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

image classification deep learning data cleaning learning rate schedule noisy label detection

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.