Multi-Task Audio Source Separation (MTASS) and SpeechNAS: AutoML‑Driven Large‑Scale Speaker Recognition
This article presents two ASRU‑2021 accepted works from Kuaishou: MTASS, a multi‑task audio source separation framework that jointly separates speech, music and noise, and SpeechNAS, an AutoML‑based neural architecture search method that achieves state‑of‑the‑art speaker recognition performance with significantly fewer parameters.