How JD AI’s Four Interspeech 2020 Papers Advance Speech Processing
JD AI Research Institute presented four accepted Interspeech 2020 papers—covering sound event localization, speech dereverberation, speaker verification, and an efficient WaveGlow vocoder—demonstrating significant advances in audio AI despite the conference’s shift to an online format due to COVID‑19.
Due to the COVID‑19 pandemic, the Interspeech 2020 conference originally scheduled for Shanghai was moved online. JD AI Research Institute had four papers accepted, covering sound event localization and detection, speech dereverberation, speaker verification, and an efficient WaveGlow vocoder.
Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi‑task Learning
Sound event detection and localization are crucial for smart home and security applications but face challenges from noise, reverberation, and overlapping sources. This paper proposes a method that combines traditional acoustic beamforming with multi‑task deep learning, extracting directional source signals via fixed beams and providing rich spatial representations without prior source localization.
The approach computes direction‑of‑arrival vectors from inter‑power spectra, removing dependence on microphone array geometry, and designs separate networks for source localization and event detection. Evaluated on the DCASE2019 dataset, it achieved the best overall performance.
Skip Convolutional Neural Network for Speech Dereverberation using Optimally Smoothed Spectral Mapping
This work, a collaboration between JD AI Research Institute and the University of Texas at Dallas, extends the UNet‑style fully convolutional network for speech dereverberation. It replaces each skip connection with a dedicated convolutional block (SkipConvNet) and introduces an optimal‑smoothed power‑spectral preprocessing step.
Experiments on the REVERB Challenge corpus show significant improvements in objective quality metrics and notable gains in speech recognition and speaker identification under reverberant conditions.
The JD AI Speaker Verification System for the FFSVC 2020 Challenge
Far‑field speaker verification suffers from complex acoustic environments. Leveraging the FFSVC2020 competition data (≈1100 h, 120 speakers), the system explores data augmentation, model architectures (TDNN, TDNN‑F, ResNet, Transformer), and scoring strategies.
1) Apply beamforming, channel switching, and dereverberation to convert far‑field recordings to near‑field. 2) Simulate room impulse responses to convolve with near‑field data, adding realistic reverberation. 3) Inject recorded environmental noises for additive noise augmentation. 4) Use data augmentation in training and testing to increase diversity, yielding large performance gains.
Incorporating TDNN, ResNet, and Transformer back‑ends with score normalization and a two‑stage scoring pipeline reduced minDCF by 0.2393 and EER by 3.16 % relative to the baseline.
Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed
WaveGlow‑style neural vocoders are essential for high‑quality speech synthesis. The proposed Efficient WaveGlow replaces the WaveNet‑based affine coupling layers with FFTNet‑based transformations, employs group convolutions, and shares local conditioning across layers to reduce parameters.
Compared with the original WaveGlow, Efficient WaveGlow cuts computational cost and model size by more than 12× while maintaining audio quality, achieving a 6× speedup on CPU and 5× on a P40 GPU.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
