How NetEase Cloud Communication Tackles Voice Reverberation with Adaptive Dual‑Mic Algorithms

This article explains the growing need for speech dereverberation in audio‑video conferencing, outlines the physical causes of reverberation, reviews historical research, and details NetEase Cloud's adaptive dual‑mic signal‑correlation approach, algorithm implementations, performance optimizations, and future directions.

NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
How NetEase Cloud Communication Tackles Voice Reverberation with Adaptive Dual‑Mic Algorithms

Voice Reverberation Overview

As audio‑video conferencing becomes ubiquitous, participants encounter increasingly noticeable reverberation in various environments such as large conference rooms, glass rooms, and small poorly insulated spaces. To ensure intelligible and comfortable speech, dereverberation has become a critical and urgent requirement.

Causes of Reverberation

Reverberation depends on room enclosure, size, reflective materials, and the distance between speaker and microphone.

Research Development

Early studies focused on acoustic design for concert halls and classrooms.

Subsequent work examined reverberation’s impact on speech intelligibility.

Some researchers explored positive effects, such as enhanced naturalness and spatial perception, and applied artificial reverberation in entertainment and gaming.

Evaluation Metrics

Performance indicators for dereverberation are categorized based on application scenarios (e.g., handset communication, video conferencing, voice assistants).

Key Algorithms and Progress

Linear prediction‑based evolution algorithms.

Correlation‑suppression algorithms.

Planned integration of deep‑learning approaches.

AWPE Algorithm

The Adaptive Weighted Prediction Error (AWPE) algorithm is implemented and combined with noise reduction to improve communication quality.

Signal Model

Let Xt^m denote the signal received by microphone m at time t, Lm the number of microphones, hk^m the impulse response from source s to microphone m, and nt^m the additive noise component.

The model incorporates past microphone data dt^m as the target early‑reverberation signal, while the direct source signal is less commonly used because early reverberation can benefit perception.

Correlation‑Based Noise Reduction Dereverberation

Assuming late reverberation behaves as scattered field noise, the algorithm estimates its magnitude via inter‑microphone noise correlation and applies spectral subtraction to obtain gain for dereverberation.

Implementation and Computational Optimization

Configure buffer mechanisms (microphone count, history frames, frequency bins) to reduce processing time; use Woodbury matrix identity in RLS to avoid matrix inversion.

Apply smoothing updates for statistical information.

Diagonalize or real‑ify matrices where possible to lower computational load.

Results and Future Outlook

Current dereverberation achieves 800 ms–1 s latency with priority on speech fidelity; key tunable parameters are the forgetting factor and block count. Future work includes adaptive forgetting‑factor schemes and deep‑learning solutions that jointly handle dereverberation and noise reduction.

References

Xiang, Teng, Jing Lu, and Kai Chen. "Multi‑channel adaptive dereverberation robust to abrupt change of target speaker position." JASA 145.3 (2019): EL250‑EL256.

Taniguchi, Toru, et al. "Generalized weighted‑prediction‑error dereverberation with varying source priors for reverberant speech recognition." IEEE WASPAA 2019.

Tang, Xinyu, et al. "A Time‑Varying Forgetting Factor‑Based QRRLS Algorithm for Multichannel Speech Dereverberation." IEEE ISSPIT 2020.

Schwarz, Andreas. "Dereverberation and Robust Speech Recognition Using Spatial Coherence Models." PhD dissertation, FAU, 2019.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

audio processingnoise reductionspeech dereverberationadaptive algorithmsdual-mic
NetEase Smart Enterprise Tech+
Written by

NetEase Smart Enterprise Tech+

Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.