Tagged articles
7 articles
Page 1 of 1
DataFunSummit
DataFunSummit
Feb 4, 2024 · Mobile Development

Advanced Mobile Audio Recording Techniques in Quanjian K‑Song: Low Latency, High Fidelity, and Intelligent Audio Processing

The article details how Quanjian K‑Song has built a comprehensive mobile‑focused audio recording system since 2014, covering low‑latency capture, high‑quality sampling, lyric and vocal‑accompaniment alignment, ear‑return, pitch shifting, vocal enhancement, 3A processing, and AI‑driven scoring to deliver a professional karaoke experience on smartphones.

AI scoringAudio ProcessingLow latency
0 likes · 14 min read
Advanced Mobile Audio Recording Techniques in Quanjian K‑Song: Low Latency, High Fidelity, and Intelligent Audio Processing
Kuaishou Tech
Kuaishou Tech
Dec 28, 2023 · Artificial Intelligence

Kuaishou Audio Team Wins ICASSP 2024 SSI and PLC Challenges with Advanced Speech Enhancement and Packet Loss Concealment

The Kuaishou audio team secured first place in both the ICASSP 2024 Speech Signal Improvement and Audio Deep Packet Loss Concealment challenges by deploying a two‑stage GAN‑based speech enhancement system and a hybrid time‑frequency packet‑loss concealment model that dramatically improve real‑time communication quality.

Audio ProcessingGANICASSP 2024
0 likes · 8 min read
Kuaishou Audio Team Wins ICASSP 2024 SSI and PLC Challenges with Advanced Speech Enhancement and Packet Loss Concealment
Douyu Streaming
Douyu Streaming
Oct 20, 2021 · Artificial Intelligence

How DeepXi and MHANet Revolutionize Speech Enhancement with Multi‑Head Attention

DeepXi introduces a two‑stage deep learning framework for speech enhancement, using prior SNR estimation and MMSE gain, while the MHANet extension leverages multi‑head attention to model long‑range dependencies, with detailed training strategies, model compression to GRU, deployment via TFLite, and impressive low‑latency results.

Deep LearningGRUTFLite
0 likes · 8 min read
How DeepXi and MHANet Revolutionize Speech Enhancement with Multi‑Head Attention
Douyu Streaming
Douyu Streaming
Oct 15, 2021 · Artificial Intelligence

How End-to-End Deep Learning Boosts Real-Time Speech Enhancement

An end‑to‑end deep‑learning framework for speech enhancement is presented, detailing dataset creation, time‑domain feature extraction, a convolutional separation network, decoding, and training strategies using SI‑SIR loss with PIT, achieving a final SI‑SIR of 13 dB.

Deep LearningPITSI-SIR
0 likes · 9 min read
How End-to-End Deep Learning Boosts Real-Time Speech Enhancement
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Aug 23, 2021 · Artificial Intelligence

How a Lightweight Neural Network Cuts Transient Noise in Real‑Time Audio

NetEase Cloud Communication’s Audio Lab presents a low‑complexity neural‑network denoising algorithm that effectively suppresses both stationary and transient noises while preserving speech quality, detailing its mathematical model, feature design, loss function, GRU‑based architecture, real‑time performance, and comparative evaluation against state‑of‑the‑art methods.

Neural NetworkReal-time Processingaudio denoising
0 likes · 13 min read
How a Lightweight Neural Network Cuts Transient Noise in Real‑Time Audio
JD Cloud Developers
JD Cloud Developers
Feb 10, 2021 · Artificial Intelligence

Three JD Tech AI Papers Shine at ICASSP 2021

At ICASSP 2021, JD Tech presented three AI research papers—introducing a Neural Kalman Filtering framework for speech enhancement, a cross‑utterance BERT‑based prosody modeling method for end‑to‑end speech synthesis, and a self‑supervised conversational query rewriting approach—each demonstrating superior performance over existing baselines on benchmark datasets.

AI researchICASSP 2021prosody modeling
0 likes · 9 min read
Three JD Tech AI Papers Shine at ICASSP 2021
Tencent Cloud Developer
Tencent Cloud Developer
Mar 19, 2020 · Artificial Intelligence

Real-Time Voice Communication Technologies and AI Enhancements in Tencent Meeting

Shang Shidong outlines Tencent Meeting’s shift from analog PSTN to IP‑based VoIP, using H.323, SIP, RTP/UDP and the Opus codec, while AI‑driven super‑resolution, deep‑learning packet‑loss concealment, advanced noise reduction, and speech‑music classification boost audio quality, complemented by reference‑free MOS assessment and future 5G‑enabled cloud, IoT and WebRTC integration.

AIAudio ProcessingRTP
0 likes · 30 min read
Real-Time Voice Communication Technologies and AI Enhancements in Tencent Meeting