Tagged articles

speech enhancement

8 articles · Page 1 of 1

May 13, 2026 · Artificial Intelligence

Three Simple Steps to Make AI‑Cloned Voices Sound Truly Like You

The article reveals that 80% of AI voice‑cloning failures stem from poor recording quality, analyzes three fatal sample defects—noise pollution, high‑frequency loss, and invalid segments—and proposes a three‑step “Extract → Enhance → Select” pipeline using BS‑RoFormer, DeepFilterNet3 and NISQA, boosting similarity from 68% to 89%.

AIDeep LearningSpeech synthesis

0 likes · 16 min read

Three Simple Steps to Make AI‑Cloned Voices Sound Truly Like You

DataFunSummit

Feb 4, 2024 · Mobile Development

Advanced Mobile Audio Recording Techniques in Quanjian K‑Song: Low Latency, High Fidelity, and Intelligent Audio Processing

The article details how Quanjian K‑Song has built a comprehensive mobile‑focused audio recording system since 2014, covering low‑latency capture, high‑quality sampling, lyric and vocal‑accompaniment alignment, ear‑return, pitch shifting, vocal enhancement, 3A processing, and AI‑driven scoring to deliver a professional karaoke experience on smartphones.

AI scoringAudio Processingkaraoke technology

0 likes · 14 min read

Advanced Mobile Audio Recording Techniques in Quanjian K‑Song: Low Latency, High Fidelity, and Intelligent Audio Processing

Kuaishou Tech

Dec 28, 2023 · Artificial Intelligence

Kuaishou Audio Team Wins ICASSP 2024 SSI and PLC Challenges with Advanced Speech Enhancement and Packet Loss Concealment

The Kuaishou audio team secured first place in both the ICASSP 2024 Speech Signal Improvement and Audio Deep Packet Loss Concealment challenges by deploying a two‑stage GAN‑based speech enhancement system and a hybrid time‑frequency packet‑loss concealment model that dramatically improve real‑time communication quality.

Audio ProcessingGaNICASSP 2024

0 likes · 8 min read

Kuaishou Audio Team Wins ICASSP 2024 SSI and PLC Challenges with Advanced Speech Enhancement and Packet Loss Concealment

Douyu Streaming

Oct 20, 2021 · Artificial Intelligence

How DeepXi and MHANet Revolutionize Speech Enhancement with Multi‑Head Attention

DeepXi introduces a two‑stage deep learning framework for speech enhancement, using prior SNR estimation and MMSE gain, while the MHANet extension leverages multi‑head attention to model long‑range dependencies, with detailed training strategies, model compression to GRU, deployment via TFLite, and impressive low‑latency results.

Deep LearningGRUMulti-Head Attention

0 likes · 8 min read

How DeepXi and MHANet Revolutionize Speech Enhancement with Multi‑Head Attention

Douyu Streaming

Oct 15, 2021 · Artificial Intelligence

How End-to-End Deep Learning Boosts Real-Time Speech Enhancement

An end‑to‑end deep‑learning framework for speech enhancement is presented, detailing dataset creation, time‑domain feature extraction, a convolutional separation network, decoding, and training strategies using SI‑SIR loss with PIT, achieving a final SI‑SIR of 13 dB.

Deep LearningPITSI-SIR

0 likes · 9 min read

How End-to-End Deep Learning Boosts Real-Time Speech Enhancement

NetEase Smart Enterprise Tech+

Aug 23, 2021 · Artificial Intelligence

How a Lightweight Neural Network Cuts Transient Noise in Real‑Time Audio

NetEase Cloud Communication’s Audio Lab presents a low‑complexity neural‑network denoising algorithm that effectively suppresses both stationary and transient noises while preserving speech quality, detailing its mathematical model, feature design, loss function, GRU‑based architecture, real‑time performance, and comparative evaluation against state‑of‑the‑art methods.

Neural NetworkReal-time Processingaudio denoising

0 likes · 13 min read

How a Lightweight Neural Network Cuts Transient Noise in Real‑Time Audio

JD Cloud Developers

Feb 10, 2021 · Artificial Intelligence

Three JD Tech AI Papers Shine at ICASSP 2021

At ICASSP 2021, JD Tech presented three AI research papers—introducing a Neural Kalman Filtering framework for speech enhancement, a cross‑utterance BERT‑based prosody modeling method for end‑to‑end speech synthesis, and a self‑supervised conversational query rewriting approach—each demonstrating superior performance over existing baselines on benchmark datasets.

AI researchICASSP 2021prosody modeling

0 likes · 9 min read

Three JD Tech AI Papers Shine at ICASSP 2021

Tencent Cloud Developer

Mar 19, 2020 · Artificial Intelligence

Real-Time Voice Communication Technologies and AI Enhancements in Tencent Meeting

Shang Shidong outlines Tencent Meeting’s shift from analog PSTN to IP‑based VoIP, using H.323, SIP, RTP/UDP and the Opus codec, while AI‑driven super‑resolution, deep‑learning packet‑loss concealment, advanced noise reduction, and speech‑music classification boost audio quality, complemented by reference‑free MOS assessment and future 5G‑enabled cloud, IoT and WebRTC integration.

AIAudio ProcessingRTP

0 likes · 30 min read

Real-Time Voice Communication Technologies and AI Enhancements in Tencent Meeting