Fundamentals 10 min read

How to Boost Real-Time Audio Quality with Advanced AEC, AGC, and ANC Techniques

This article details a comprehensive redesign of acoustic echo cancellation, automatic gain control, and automatic noise control for real‑time communication, combining WebRTC and Speex to improve delay estimation, linear filtering, and non‑linear processing, and demonstrates superior performance over the original WebRTC solution.

Douyu Streaming

Oct 18, 2021

How to Boost Real-Time Audio Quality with Advanced AEC, AGC, and ANC Techniques

0. Project Background

In real‑time communication scenarios such as link‑mic, audio must be pre‑processed. The 3A technologies—Acoustic Echo Cancellation (AEC), Automatic Gain Control (AGC) and Automatic Noise Control (ANC)—are core to link‑mic. The original solution based entirely on the open‑source WebRTC stack suffered from complex configuration, poor performance, echo and double‑talk artifacts, and severe audio drop‑out.

For audio sampled above 16 kHz, a split‑band approach is used, simplifying high‑frequency processing.

Echo and flashing (audio drop‑out) frequently occur.

Configuration is complex and inconsistent across endpoints; many audio modules are still experimental.

Double‑talk processing causes severe audio loss.

1. Improved Overall Scheme

To address the shortcomings of the pure WebRTC solution, this proposal redesigns the AEC module by combining WebRTC with Speex‑based reconstruction and tuning. The overall architecture is shown below.

The complete AEC solution consists of:

Delay estimation module

Linear filtering module

Non‑linear residual echo removal module (post‑processing)

2. Delay Estimation Module

Before AEC processing, the remote reference signal and the echo must be time‑aligned; otherwise echo cannot be removed. The delay estimation module is a core algorithm that ensures proper AEC operation.

Its workflow is illustrated in the following diagram.

After an FFT, the spectra of the far‑end and near‑end signals (far_spectrum, near_spectrum) are obtained. The far‑end spectrum is stored as candidate matches. The 32 most significant frequency bins (12‑43) are selected, and a threshold spectrum is computed. Bins exceeding the threshold are set to 1, others to 0, producing binary spectra. By XOR‑ing the two binary spectra, the candidate far‑end signal with the highest similarity is chosen and its low‑delay value is calculated.

3. Linear Filtering Module

The linear filter uses Speex’s MDF (Multidelay‑Block Frequency‑domain) algorithm, comprising three parts: linear filter structure, double‑talk control, and optimal step‑size control.

3.1 Linear Filter Structure

The MDF structure implements an FIR filter in the frequency domain using an overlap‑and‑save block‑processing approach.

The MDF algorithm includes:

Block‑wise processing of the input signal with overlap‑and‑save convolution.

FFT‑based frequency‑domain convolution, reducing complexity from O(N²) to O(N log₂ N).

Segmentation of FIR coefficients into multiple sub‑filters, shortening block lengths and reducing filter latency.

3.2 Double‑Talk Control

During double‑talk, the filter must maintain tracking performance. Speex MDF employs a dual‑filter architecture: an adaptive Background Filter and a non‑adaptive Foreground Filter. When the adaptive filter diverges, the system falls back to the foreground result and resets the background filter; when the background filter recovers, its parameters are copied to the foreground. This implicit double‑talk detection is illustrated below.

The decision is based on the power difference between the two filters:

where Sff is the power of the foreground filter error, See is the power of the background filter error, and Dbf is the squared difference of the two filter outputs. If the background filter diverges excessively, it is reset to the foreground filter.

3.3 Optimal Step‑Size Control

The MDF uses a variable step‑size derived from the ratio of residual echo variance to error‑signal variance, i.e., the optimal step‑size equals the residual echo variance divided by the error‑signal variance.

A leakage factor η (0 ≤ η ≤ 1) is introduced to estimate residual echo power, updated by recursive averaging:

4. Non‑Linear Processing

Linear AEC cannot remove all echo due to adaptive filter limitations, poor speaker quality, and acoustic design, leaving nonlinear harmonic distortion. The Non‑Linear Processing (NLP) block eliminates this residual echo. NLP consists of spectral correlation calculation and spectral gain computation.

Correlation between far‑end and near‑end signals is used to estimate residual echo magnitude.

Key variables include dfw(n,f) (near‑end signal), xfw(n,f) (echo signal), efw(n,f) (linear filter error), and γ (smoothing factor, default 0.9). Additional metrics such as hNlXdAvg, hNlXdAvgWB, hNlDeAvg, hNlXeAvg, and near_level describe various correlations and speech activity levels.

Spectral gain computation is shown below.

5. Performance Demonstration

In pure‑echo scenarios, the improved AEC (dy_audio) outperforms the original WebRTC implementation, producing cleaner echo cancellation.

6. Conclusion and Outlook

The proposed scheme achieves better echo suppression in pure‑echo conditions and preserves near‑end speech during double‑talk, reducing audio drop‑out. However, performance degrades when echo energy is high, and music echo is less effectively removed than speech. Further research is needed to address these limitations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

real-time communication WebRTC signal processing Acoustic Echo Cancellation Speex

Written by

Douyu Streaming

Official account of Douyu Streaming Development Department, sharing audio and video technology best practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.