Artificial Intelligence 10 min read

AT-ADD Challenge: Pushing All‑Type Audio Deepfake Detection Forward

The AT‑ADD competition, organized for ACM MM 2026, invites researchers to develop robust audio deepfake detection models across speech, environmental sounds, singing, and music, providing diverse real‑world datasets, baseline code, clear evaluation metrics, and a two‑stage submission process to advance AI security.

AntTech

Apr 14, 2026

AT-ADD Challenge: Pushing All‑Type Audio Deepfake Detection Forward

Background and Motivation

Rapid advances in Audio Language Models (ALMs) enable low‑cost generation of high‑fidelity audio, including speech, environmental sounds, singing, and music. While this fuels creative content production, it also raises severe security and trust concerns because high‑quality deepfake audio can be mass‑produced and disseminated, enabling identity impersonation, fraudulent voice commands, misinformation, and potential copyright violations.

Existing audio deepfake detection (ADD) research primarily focuses on clean speech scenarios and limited spoofing methods, which does not reflect real‑world conditions where recordings are captured by various devices, suffer from noise, reverberation, compression, and where attackers continuously evolve generation techniques.

AT‑ADD Challenge Overview

The ACM Multimedia 2026 All‑Type Audio Deepfake Detection (AT‑ADD) challenge bridges the gap between idealized lab settings and practical multimedia forensics. It aims to stimulate the development of detection methods that are robust to complex acoustic environments and generalize across multiple audio types.

Tracks

Track 1 – Robust Speech Deepfake Detection : Evaluates models on real‑world noisy speech recordings and on unseen advanced speech synthesis methods.

Track 2 – All‑Type Audio Deepfake Detection : Extends detection to speech, environmental sounds, singing, and music, requiring a unified real/fake decision across unknown audio categories.

Datasets and Baselines

Both tracks provide large‑scale datasets assembled from public and internal sources:

Track 1 contains over 40 speech generation models, recordings from multiple languages, devices (mobile phones, car systems, wearables), and simulated degradations (noise, reverberation, compression, replay attacks).

Track 2 includes more than 70 audio generation models covering the four audio categories, without additional signal‑level perturbations to focus on cross‑type generalization.

Six baseline models are released, covering traditional discriminative approaches, self‑supervised learning (SSL) methods, and fine‑tuned audio large language models (ALLM). All baselines share a unified training and evaluation protocol, and their code is publicly available at https://github.com/xieyuankun/AT-ADD-Baseline.

Evaluation Process and Metrics

The competition consists of two phases:

Progress Evaluation Phase : Participants develop and tune models on a 20% held‑out progress set sampled from the final test set.

Final Evaluation Phase : The full test set is released; participants submit predictions via the Codabench platform ( https://www.codabench.org/competitions/15477 for Track 1 and https://www.codabench.org/competitions/15481 for Track 2). Daily submission limits are enforced to prevent over‑fitting.

Both tracks use Macro‑F1 as the primary metric. Track 1 computes a single Macro‑F1 over all data, while Track 2 computes Macro‑F1 per audio type and averages them, also providing per‑type F1 scores for detailed analysis.

Schedule

2026‑04‑10: Release of paper, baseline code, and datasets (tentative).

2026‑04‑11: Open progress‑evaluation submissions.

2026‑06‑11: Open final‑evaluation submissions.

2026‑06‑18: Freeze final leaderboard and confirm rankings.

2026‑06‑25: Deadline for competition papers.

2026‑07‑16: Notification of paper acceptance.

2026‑08‑01: Deadline for final paper versions.

The top three technical solutions will be included in the ACM MM 2026 main conference proceedings.

Organizers and Contact

Organizing institutions include China Media University, Ant Group, Institute of Automation, Chinese Academy of Sciences, Beijing Institute of Technology, and Shanghai Jiao‑Tong University. For inquiries, contact Haonan Cheng ([email protected]), Jiayi Zhou ([email protected]), Yuankun Xie ([email protected]), and Tao Wang ([email protected]).

machine learning security Multimedia Challenge AT-ADD Audio Deepfake

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.