Tagged articles
9 articles
Page 1 of 1
Meituan Technology Team
Meituan Technology Team
Apr 13, 2023 · Artificial Intelligence

Peak-First Regularization for Low-Latency Streaming Speech Recognition

The paper presents a low‑latency streaming speech‑recognition solution that reframes latency reduction as a knowledge‑distillation task, using a simple peak‑first regularization term to shift CTC output probabilities leftward and achieve up to 200 ms average latency reduction without harming word error rate.

CTCLatency ReductionPeak-First Regularization
0 likes · 21 min read
Peak-First Regularization for Low-Latency Streaming Speech Recognition
TiPaiPai Technical Team
TiPaiPai Technical Team
Jun 18, 2021 · Artificial Intelligence

Mastering Text Recognition: Encoder & Decoder Strategies Explained

This article reviews modern text‑recognition systems, detailing how encoders such as CNN, CNN‑BiLSTM, and Transformer‑based models extract visual features, and how decoders like Position Attention, Transformer decoders, and RNN Seq2Seq align variable‑length text, while also discussing CTC loss and practical design choices.

CNNCTCDecoder
0 likes · 9 min read
Mastering Text Recognition: Encoder & Decoder Strategies Explained
Didi Tech
Didi Tech
May 25, 2020 · Artificial Intelligence

How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models

This article provides a comprehensive technical overview of modern speech recognition, covering Didi’s driver‑assistant and smart‑customer‑service applications, fundamental ASR concepts, classic GMM‑HMM methods, deep‑learning breakthroughs such as DNN‑HMM, CTC, attention‑based and transformer models, practical training tricks, signal‑processing steps, and multimodal fusion techniques.

ASRCTCDeep Learning
0 likes · 16 min read
How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models
TAL Education Technology
TAL Education Technology
Feb 28, 2020 · Artificial Intelligence

TPNN Multi‑GPU Training and Mobile Optimization for Children's Acoustic Speech Recognition Models

This article describes the TPNN deep‑learning platform’s multi‑GPU acceleration, data‑parallel BMUF training, LSTM‑CTC acoustic modeling, and a suite of mobile‑side optimizations—including model pruning, 8‑bit quantization, low‑precision matrix multiplication and mixed‑precision computation—that together achieve over 92% recognition accuracy for children’s English speech on both server and mobile devices.

BMUFCTCDeep Learning
0 likes · 15 min read
TPNN Multi‑GPU Training and Mobile Optimization for Children's Acoustic Speech Recognition Models
DataFunTalk
DataFunTalk
Feb 3, 2020 · Artificial Intelligence

Advances in Speech Recognition: Concepts, Deep Learning Methods, and Didi’s Applications

This article presents a comprehensive overview of modern speech recognition technology, covering basic ASR concepts, classic acoustic and language models, deep‑learning approaches such as DNN‑HMM, CTC, attention‑based and transformer models, multimodal fusion, signal‑processing pipelines, and practical deployment considerations at Didi.

ASRCTCDeep Learning
0 likes · 15 min read
Advances in Speech Recognition: Concepts, Deep Learning Methods, and Didi’s Applications
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 11, 2016 · Artificial Intelligence

What Were the Key Speech AI Breakthroughs at Interspeech 2016?

The Interspeech 2016 conference in San Francisco showcased major advances in speech recognition, synthesis, far‑field processing, and language modeling, highlighting CTC extensions, deep CNN innovations, WaveNet’s generative audio, and new techniques for multi‑microphone acoustic modeling.

CTCDeep LearningInterspeech 2016
0 likes · 7 min read
What Were the Key Speech AI Breakthroughs at Interspeech 2016?