Artificial Intelligence 9 min read

Three JD Tech AI Papers Shine at ICASSP 2021

At ICASSP 2021, JD Tech presented three AI research papers—introducing a Neural Kalman Filtering framework for speech enhancement, a cross‑utterance BERT‑based prosody modeling method for end‑to‑end speech synthesis, and a self‑supervised conversational query rewriting approach—each demonstrating superior performance over existing baselines on benchmark datasets.

JD Cloud Developers

Feb 10, 2021

Three JD Tech AI Papers Shine at ICASSP 2021

Neural Kalman Filtering for Speech Enhancement

The paper proposes a neural Kalman filtering framework that integrates neural networks with optimal filter theory, training the Kalman filter weights via supervised learning. It builds a recurrent neural network‑based speech temporal model, predicts long‑term envelope and Wiener filter spectra, and combines them using analytically derived optimal weights. Experiments on Librispeech, PNL‑100Nonspeech‑Sounds, and MUSAN datasets show improvements in SNR gain, PESQ, and STOI over traditional UNET and CRNN‑based methods.

Improving Prosody Modeling with Cross‑Utterance BERT Embeddings for End‑to‑End Speech Synthesis

This work extracts cross‑sentence features using a pre‑trained BERT model and feeds them into an end‑to‑end speech synthesis system to enhance prosody. Two usage strategies are explored: concatenating all context embeddings as a single input, and applying attention between each phoneme and the sequence of context embeddings to obtain weighted prosody representations. Evaluations on Chinese and English audiobook datasets demonstrate more natural and expressive synthesized speech, with listeners preferring the proposed method over baseline models.

Conversational Query Rewriting with Self‑Supervised Learning

The paper introduces a self‑supervised approach for query rewriting in multi‑turn dialogue systems. By randomly deleting or replacing co‑occurring words in the user query, the model learns to reconstruct the original query using historical context, reducing reliance on annotated data. An enhanced model, Teresa, adds a keyword detection module based on TextRank and an intent‑consistency module that aligns intent distributions between original and rewritten queries, improving rewrite quality and preserving user intent.

Overall, JD Tech has contributed over 350 papers to top AI conferences such as AAAI, IJCAI, CVPR, KDD, NeurIPS, ICML, ACL, and ICASSP, winning 19 first‑place awards, and continues to drive advancements in speech, vision, and machine learning to support digital transformation across industries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI research self-supervised learning speech enhancement ICASSP 2021 prosody modeling

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.