Artificial Intelligence 18 min read

Weak Supervision Machine Learning in Ant Group Business Scenarios

This article presents an overview of weak supervision machine learning techniques applied to Ant Group’s business scenarios, covering an introduction to weak supervision, challenges of modeling with scarce or noisy labels, detailed methodologies for cross‑domain causal effect estimation, multi‑source noisy label denoising, and real‑world application examples.

DataFunSummit
DataFunSummit
DataFunSummit
Weak Supervision Machine Learning in Ant Group Business Scenarios

Introduction

The talk focuses on the application of weak‑supervision machine learning in various Ant Group business scenarios.

Four main topics are covered: (1) an overview of weak supervision, (2) modeling under sample scarcity, (3) modeling under noisy labels, and (4) a brief introduction to practical use cases.

1. Weak Supervision Overview

Traditional fully supervised learning assumes abundant, accurate labeled data and distribution consistency between training and deployment. Many real‑world problems violate these assumptions, prompting the use of weak‑supervision techniques.

According to Zhou Zhihua’s 2018 survey, weak supervision can be categorized into three typical problems:

Incomplete Supervision – a small set of labeled samples and a large set of unlabeled samples (e.g., semi‑supervised and active learning).

Inaccurate Supervision – labels exist but are noisy (e.g., rule‑based labeling).

Inexact Supervision – each label corresponds to a group of instances rather than a single instance (e.g., multi‑instance learning).

2. Weak‑Supervision Problems in Ant Scenarios

Two representative cases are discussed:

Cross‑scene modeling where the target domain lacks sufficient labeled data, but related high‑risk or historical data from other domains are available.

Scenarios with expensive or noisy labeling (e.g., fraud detection), where expert rules or existing models provide imperfect labels that can still be leveraged.

3. Modeling with Sample Scarcity – Cross‑Domain Treatment Effect Estimation

The goal is to estimate the causal effect of an intervention (treatment) on an outcome, which differs from standard predictive modeling.

Two typical situations are considered:

Target domain has no labeled data.

Target domain has only a few labeled samples.

A “Direct Learning” framework is proposed: first estimate pseudo‑effects using source‑domain control and treated models, then train an effect model on these pseudo‑effects. Distribution shift is addressed via density‑ratio based domain adaptation.

Unreliable pseudo‑effects are mitigated by an MC‑dropout based uncertainty estimation, producing a reliability score that re‑weights samples during effect‑model training.

Extensive experiments (published at CIKM under the title “Treatment Effect Estimation across Domains”) show the method’s robustness to significant distribution differences and comparable performance when distributions are similar.

4. Modeling with Noisy Labels – Multi‑Source Noisy‑Label Denoising

In many target scenarios, accurate labels are scarce, but multiple noisy sources (expert tags, rule‑based labels, legacy models) are available.

The proposed solution consists of two theoretical insights:

Models can identify instance‑wise noisy labels (e.g., higher loss indicates potential mislabeling).

Models can also discern annotator‑wise quality, distinguishing high‑quality from low‑quality label sources.

Based on these insights, a framework with three key modules is built:

Self‑cognition : estimates the reliability of each sample and each label source.

Mutual‑denoising : uses reliable sources to generate pseudo‑labels for other sources, weighted by the learned reliability scores (1‑w).

Selective Knowledge Distillation : creates a lightweight model for deployment by distilling knowledge from the weighted multi‑source training.

The approach, accepted at ICML under the title “Self‑cognitive Denoising in the Presence of Multiple Noisy Label Sources,” demonstrates consistent improvements across various datasets and ablation studies confirming the benefit of each module.

5. Application Scenarios

For sample‑scarce cases, cross‑scene data can be leveraged in interventions such as marketing coupon distribution.

For noisy‑label cases, multi‑source noisy tags can support user profiling, demand prediction, and other attribute classification tasks.

The presentation concludes with a thank‑you note.

machine learningcausal inferenceWeak SupervisionCross-Domainnoisy labels
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.