Kuaishou Audio‑Video Technology: Winning Echo Cancellation Techniques and Deep AEC
This article details Kuaishou's award‑winning acoustic echo cancellation (AEC) solutions, explains the physics of echo formation, reviews traditional linear and nonlinear AEC methods, and introduces their Deep AEC approach that combines cross‑network architecture and advanced loss functions to achieve superior performance in both single‑talk and double‑talk scenarios.
In the Interspeech 2021 AEC Challenge, Kuaishou emerged as a dark horse, winning the world champion for dual‑talk echo cancellation, runner‑up for single‑talk, and third place overall.
The competition attracted major companies and research institutes such as Amazon, Alibaba, ByteDance, and the Chinese Academy of Sciences.
Echo is generated when the speaker output is re‑captured by the microphone and fed back, creating a loop that degrades user experience in online meetings.
Traditional AEC relies on estimating the acoustic transfer function between the local speaker and microphone, using linear adaptive filtering (step 1) and nonlinear processing (step 2) to subtract the estimated echo from the mixed signal.
While effective in simple acoustic environments, traditional methods struggle with double‑talk scenarios and complex reverberation.
Kuaishou's Deep AEC fuses conventional AEC with a deep learning model trained on massive real‑world data (≈1000 h clean speech, >100 h noise, 5000 device recordings) and extensive data augmentation, enabling robust echo removal even in challenging conditions.
The Deep AEC architecture introduces two key innovations: a cross‑network (CrossNet) that jointly estimates speech and interference while sharing information, and a novel loss function that improves network performance.
Additional enhancements address common noise and large reverberation, further improving audio quality.
Objective metrics show the system achieves around 56 dB ERLE in single‑talk and a 0.8 point PESQ gain across various SNRs, while maintaining a processing latency under 40 ms.
These technical advances secured Kuaishou's top results in the AEC Challenge and demonstrate the future potential of deep‑learning‑driven echo cancellation.
At the end of the article, Kuaishou invites talent to join its audio‑video technology team, highlighting opportunities to work on cutting‑edge media technologies.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.