Artificial Intelligence 17 min read

M2SD: Multiple Mixing Self-Distillation for Few-Shot Class-Incremental Learning

This paper introduces M2SD, a dual‑branch multiple‑mixing self‑distillation framework that expands feature space, mitigates overfitting and catastrophic forgetting, and achieves state‑of‑the‑art results on CIFAR‑100, CUB‑200 and miniImageNet for few‑shot class‑incremental learning.

Alibaba Cloud Big Data AI Platform

Mar 19, 2024

M2SD: Multiple Mixing Self-Distillation for Few-Shot Class-Incremental Learning

1. Introduction

Few‑shot class incremental learning (FSCIL) aims to recognize new classes with only a few samples while preserving knowledge of previously learned classes, without retraining the entire model.

2. Motivation

The core challenges are overfitting caused by scarce data and catastrophic forgetting when new classes are introduced. Existing regularization methods alleviate forgetting, but the FACT approach proposes forward‑compatible feature‑space preparation during the base session.

3. Proposed Method: Multiple Mixing Self‑Distillation (M2SD)

We design a dual‑branch architecture that expands the feature space for new categories. A feature‑enhancement mechanism feeds the enhanced features back to the backbone via self‑distillation, improving classification performance while keeping only the main network for inference.

The method employs multi‑scale feature extraction and fusion, using Mixup and CutMix to generate diverse virtual classes. A multi‑branch virtual‑class mixing distillation aligns the distributions of virtual classes with KL divergence.

A self‑distillation with attention‑enhancement module refines features: multi‑head self‑attention (MHSA) is applied to the first and fourth feature blocks, while coordinate attention (CA) is used in the second and third blocks. BiFPN fuses multi‑scale features, and an attention‑transfer loss encourages consistency between original and enhanced attention maps.

4. Experiments

We compare M2SD with state‑of‑the‑art methods on three benchmark datasets: CIFAR‑100, CUB‑200 and miniImageNet. M2SD consistently outperforms prior work, achieving average gains of over 2 % on CIFAR‑100 and CUB‑200 and more than 3 % on miniImageNet.

Feature‑space analysis shows a 27 % reduction in intra‑class distance and a 22 % increase in inter‑class distance, confirming the effectiveness of the expanded and compatible feature space.

Ablation studies demonstrate the contribution of each component, including the dual‑branch virtual‑class strategy, the feature‑enhancement module, and the attention‑transfer loss.

5. Conclusion and Outlook

M2SD provides a forward‑compatible feature space for FSCIL by combining dual‑branch virtual‑class distillation, multi‑scale feature enhancement, and self‑distillation with attention mechanisms, leading to superior performance on challenging few‑shot incremental benchmarks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

few-shot learning Self‑Distillation class incremental learning feature augmentation M2SD

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.