Tagged articles

CTC

9 articles · Page 1 of 1

Apr 13, 2023 · Artificial Intelligence

Peak-First Regularization for Low-Latency Streaming Speech Recognition

The paper presents a low‑latency streaming speech‑recognition solution that reframes latency reduction as a knowledge‑distillation task, using a simple peak‑first regularization term to shift CTC output probabilities leftward and achieve up to 200 ms average latency reduction without harming word error rate.

CTCKnowledge DistillationLatency Reduction

0 likes · 21 min read

Peak-First Regularization for Low-Latency Streaming Speech Recognition

Code DAO

Dec 10, 2021 · Artificial Intelligence

Deep Learning for Automatic Speech Recognition (ASR): From Mel Spectrograms to CTC Decoding

This article explains the end‑to‑end deep‑learning pipeline for speech‑to‑text, covering audio digitization, preprocessing with librosa, conversion to Mel spectrograms and MFCCs, data augmentation, a CNN‑RNN architecture, CTC loss, decoding strategies and evaluation with word error rate.

ASRBeam SearchCTC

0 likes · 13 min read

Deep Learning for Automatic Speech Recognition (ASR): From Mel Spectrograms to CTC Decoding

TiPaiPai Technical Team

Jun 18, 2021 · Artificial Intelligence

Mastering Text Recognition: Encoder & Decoder Strategies Explained

This article reviews modern text‑recognition systems, detailing how encoders such as CNN, CNN‑BiLSTM, and Transformer‑based models extract visual features, and how decoders like Position Attention, Transformer decoders, and RNN Seq2Seq align variable‑length text, while also discussing CTC loss and practical design choices.

CNNCTCEncoder

0 likes · 9 min read

Mastering Text Recognition: Encoder & Decoder Strategies Explained

Didi Tech

May 25, 2020 · Artificial Intelligence

How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models

This article provides a comprehensive technical overview of modern speech recognition, covering Didi’s driver‑assistant and smart‑customer‑service applications, fundamental ASR concepts, classic GMM‑HMM methods, deep‑learning breakthroughs such as DNN‑HMM, CTC, attention‑based and transformer models, practical training tricks, signal‑processing steps, and multimodal fusion techniques.

ASRCTCDeep Learning

0 likes · 16 min read

How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models

TAL Education Technology

Feb 28, 2020 · Artificial Intelligence

TPNN Multi‑GPU Training and Mobile Optimization for Children's Acoustic Speech Recognition Models

This article describes the TPNN deep‑learning platform’s multi‑GPU acceleration, data‑parallel BMUF training, LSTM‑CTC acoustic modeling, and a suite of mobile‑side optimizations—including model pruning, 8‑bit quantization, low‑precision matrix multiplication and mixed‑precision computation—that together achieve over 92% recognition accuracy for children’s English speech on both server and mobile devices.

BMUFCTCDeep Learning

0 likes · 15 min read

TPNN Multi‑GPU Training and Mobile Optimization for Children's Acoustic Speech Recognition Models

DataFunTalk

Feb 3, 2020 · Artificial Intelligence

Advances in Speech Recognition: Concepts, Deep Learning Methods, and Didi’s Applications

This article presents a comprehensive overview of modern speech recognition technology, covering basic ASR concepts, classic acoustic and language models, deep‑learning approaches such as DNN‑HMM, CTC, attention‑based and transformer models, multimodal fusion, signal‑processing pipelines, and practical deployment considerations at Didi.

ASRCTCDeep Learning

0 likes · 15 min read

Advances in Speech Recognition: Concepts, Deep Learning Methods, and Didi’s Applications

Hulu Beijing

Apr 22, 2019 · Artificial Intelligence

How Has Speech Recognition Evolved from Traditional Methods to Modern Deep Learning?

This article reviews the fundamentals of automatic speech recognition, compares traditional MFCC‑GMM‑HMM pipelines with modern deep neural network approaches such as DNN‑HMM, LSTM‑CTC, and attention‑based models, and illustrates each evolution step with flowchart diagrams and key references.

ASRCTCDNN

0 likes · 11 min read

How Has Speech Recognition Evolved from Traditional Methods to Modern Deep Learning?

Liulishuo Tech Team

Oct 28, 2016 · Artificial Intelligence

Open‑sourcing kaldi‑ctc: Fast GPU‑Accelerated CTC End‑to‑End Speech Recognition

The article announces the open‑source release of kaldi‑ctc, a GPU‑accelerated CTC‑based end‑to‑end speech recognition toolkit built on Kaldi, warp‑ctc and cuDNN, highlighting its 5‑6× training speedup, real‑time decoding factor of 0.02, and performance comparisons on the LibriSpeech corpus.

ASRCTCDeep Learning

0 likes · 4 min read

Open‑sourcing kaldi‑ctc: Fast GPU‑Accelerated CTC End‑to‑End Speech Recognition

Alibaba Cloud Developer

Oct 11, 2016 · Artificial Intelligence

What Were the Key Speech AI Breakthroughs at Interspeech 2016?

The Interspeech 2016 conference in San Francisco showcased major advances in speech recognition, synthesis, far‑field processing, and language modeling, highlighting CTC extensions, deep CNN innovations, WaveNet’s generative audio, and new techniques for multi‑microphone acoustic modeling.

CTCDeep LearningInterspeech 2016

0 likes · 7 min read

What Were the Key Speech AI Breakthroughs at Interspeech 2016?