How AI is Transforming Real-Time Audio Communication: Challenges and Solutions
This article explores the evolution of AI audio algorithms in real‑time communication, detailing current trends, technical hurdles such as computational complexity and data scarcity, and practical solutions including lightweight models, data augmentation, and hybrid AI‑traditional pipelines, illustrated with real‑world NetEase Cloud IM case studies.
Introduction
At QCon 2021 in Shanghai, NetEase Cloud IM VP Chen Gong introduced a session on "AI Era Fusion Communication Technology" featuring experts from NetEase Cloud IM, NetEase Audio‑Video Lab, and NetEase Cloud Music. This third installment focuses on the practice of AI audio algorithms in RTC.
Speaker
Hao Yiya, NetEase Cloud IM audio algorithm expert and IEEE reviewer, has published 15 papers, filed 7 patents, and contributed to projects for NIH hearing aids, Apple AirPods, Facebook Reality Labs AR/VR audio, and Zoom real‑time audio.
AI Audio Algorithm Trends
AI audio algorithms are emerging in the RTC field, but many still rely on traditional signal‑processing methods that have been refined over centuries. AI approaches remain largely experimental and simulation‑based.
In RTC, AI audio faces two main challenges: the need for higher noise‑reduction strength while preserving speech, and the limitation of traditional DSP methods that struggle with complex, mixed signal environments.
According to Tsahi, AI algorithms are currently on par with traditional DSP in performance, but as computing power grows (supercomputers have increased ten‑fold every five years since 2000), AI methods are expected to accelerate and eventually surpass traditional techniques.
Challenges of Applying AI Audio Algorithms in RTC
Computational Complexity : Large AI models demand significant processing power, which is problematic for low‑end devices that must handle real‑time audio streams within a 10 ms frame budget.
Generalization and Robustness : AI models must perform well across diverse scenarios, including entertainment environments with music signals and low‑SNR background noise, which differ from the simpler conference‑room use case.
Data Availability : High‑quality labeled audio data is scarce, especially for specialized tasks like whistle detection, requiring extensive data collection, augmentation, and cleaning.
Solution Approaches
Two example topics illustrate practical solutions:
AI Noise Reduction : Challenges include meeting real‑time constraints and managing CPU usage. Strategies involve selecting lightweight features, using GRU models instead of LSTM, compressing network layers, and employing NetEase's NENN inference framework to achieve 100‑200 µs per 10 ms frame.
Voice Activity Detection (VAD) : Comparison of traditional energy‑based VAD, statistical methods, and deep‑learning CNN‑based VAD shows that CNNs outperform others in low‑SNR, non‑stationary noise conditions, though they still face complexity and generalization issues.
Implementation Details
NetEase combines AI with traditional algorithms, e.g., using AI for non‑linear processing while retaining DSP for echo cancellation and delay estimation. This hybrid approach reduces computational load and improves performance.
Current AI products include AI noise reduction, AI audio scene detection, AI whistle detection, 3D audio, and AI echo cancellation (in development). AI Noise Reduction 2.0 aims for near‑lossless speech quality and is slated for release by year‑end.
3D Audio Case Studies
NetEase Cloud IM is the only RTC provider offering 6DoF spatial audio. Applications include the FPS game "Wilderness Action" where teammate voice gains directional cues, and the immersive event system "Yaotai" that provides real‑time spatial audio for virtual conferences.
Recommended Reading
Additional resources and demo videos are linked throughout the presentation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
NetEase Smart Enterprise Tech+
Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
