Kuaishou Audio Team Wins ICASSP 2022 AEC Challenge Runner‑up and Publishes Two Deep Noise Suppression Papers
The Kuaishou audio team achieved a world‑runner‑up in the ICASSP 2022 Acoustic Echo Cancellation Challenge and had two deep noise suppression papers accepted, showcasing a hybrid traditional‑signal‑processing and deep‑learning system that excels in echo removal, noise reduction, and speech recognition across multiple real‑time communication scenarios.
At the ICASSP 2022 Acoustic Echo Cancellation (AEC) Challenge, Kuaishou's audio‑video technology team placed second overall, ranking in the top three for near‑end single‑talk quality, far‑end single‑talk echo cancellation, dual‑talk quality and echo cancellation, and speech‑recognition accuracy.
The competition, co‑organized by ICASSP and Microsoft, evaluated systems using subjective Mean Opinion Scores (MOS/DMOS) and an objective speech‑recognition metric, with test data sampled at 48 kHz ultra‑wideband, raising the bar for model generalisation and computational complexity.
Kuaishou built a hybrid system that combines robust traditional signal‑processing algorithms with a deep‑learning network (CrossNet), employing a high‑low‑frequency fusion architecture to handle 48 kHz audio without significantly increasing computational load, and trained on roughly 1,000 hours of clean speech, 100+ hours of diverse noise, and extensive real‑world echo recordings.
In internal evaluations, the system improved objective speech‑quality scores (PESQ) by over 1 point, achieved Echo Return Loss Enhancement (ERLE) above 55 dB, and reduced Word Error Rate (WER) to below 15 % across near‑end, far‑end, and dual‑talk scenarios, securing the overall runner‑up position.
The technology has already been deployed in Kuaishou’s K‑song app, enhancing user experience in karaoke recording, scoring, and live‑room features, and is slated for broader use in live‑streaming and conference systems.
In addition, Kuaishou’s audio team had two papers accepted at ICASSP 2022 in the Deep Noise Suppression (DNS) track: “A TWO‑STEP BACKWARD COMPATIBLE FULLBAND SPEECH ENHANCEMENT SYSTEM” and “L‑SPEX: LOCALIZED TARGET SPEAKER EXTRACTION,” both describing novel full‑band speech enhancement and target‑speaker extraction techniques that have been integrated into multiple Kuaishou applications.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.