Tagged articles

acoustic modeling

7 articles · Page 1 of 1

Feb 28, 2020 · Artificial Intelligence

TPNN Multi‑GPU Training and Mobile Optimization for Children's Acoustic Speech Recognition Models

This article describes the TPNN deep‑learning platform’s multi‑GPU acceleration, data‑parallel BMUF training, LSTM‑CTC acoustic modeling, and a suite of mobile‑side optimizations—including model pruning, 8‑bit quantization, low‑precision matrix multiplication and mixed‑precision computation—that together achieve over 92% recognition accuracy for children’s English speech on both server and mobile devices.

BMUFCTCDeep Learning

0 likes · 15 min read

TPNN Multi‑GPU Training and Mobile Optimization for Children's Acoustic Speech Recognition Models

Alibaba Cloud Developer

Nov 27, 2018 · Artificial Intelligence

How Linear Networks Enable Speaker‑Adaptive Speech Synthesis with Minimal Data

This article presents a linear‑network‑based speaker‑adaptation method for text‑to‑speech that achieves synthesis quality comparable to large‑scale training using only a few hundred target‑speaker utterances, and introduces a low‑rank‑plus‑diagonal compression to improve stability with scarce data.

Artificial IntelligenceSpeech synthesisacoustic modeling

0 likes · 9 min read

How Linear Networks Enable Speaker‑Adaptive Speech Synthesis with Minimal Data

Alibaba Cloud Developer

Oct 31, 2018 · Artificial Intelligence

How Deep‑FSMN and Low Frame Rate Accelerate Speech Recognition

This article introduces the Deep‑FSMN (DFSMN) architecture and its integration with low‑frame‑rate (LFR) processing, showing how the combined LFR‑DFSMN acoustic model achieves higher accuracy, smaller model size, faster training, and lower latency than traditional BLSTM‑based speech recognition systems on both English and Chinese large‑vocabulary tasks.

.aiDFSMNacoustic modeling

0 likes · 12 min read

How Deep‑FSMN and Low Frame Rate Accelerate Speech Recognition

Meituan Technology Team

Oct 25, 2018 · Artificial Intelligence

Deep Learning System Design and Parallel Computing Solutions at Meituan

Meituan built a custom deep‑learning platform that combines data‑parallel and hybrid parallelism across multi‑GPU/cluster hardware, uses coarse‑grained scheduling and Kaldi‑derived acoustic algorithms, and supports fast NLU model hot‑updates, achieving near‑linear GPU scaling and 6–7× speedups over traditional solutions.

AI InfrastructureNLUacoustic modeling

0 likes · 13 min read

Deep Learning System Design and Parallel Computing Solutions at Meituan

Tencent Cloud Developer

Sep 26, 2018 · Artificial Intelligence

Breakthroughs in AI: Deep Learning Applications in Speech Recognition

The talk reviews how massive speech data, faster GPUs/CPUs, and deep‑learning models such as DNN, LSTM, CNN, and end‑to‑end CTC have dramatically boosted speech‑recognition accuracy, while outlining remaining challenges like noise, accents, far‑field and multi‑speaker scenarios and describing Tencent Cloud’s related services.

.aiacoustic modelingneural networks

0 likes · 16 min read

Breakthroughs in AI: Deep Learning Applications in Speech Recognition

Alibaba Cloud Developer

Jun 8, 2018 · Artificial Intelligence

How DFSMN Sets a New Record in Speech Recognition Accuracy and Speed

Alibaba's DAMO Academy has open‑sourced the Deep‑Feedforward Sequential Memory Network (DFSMN), a next‑generation speech‑recognition model that achieves a world‑record 96.04% accuracy on LibriSpeech, trains three times faster than LSTM, halves model size, and dramatically speeds up real‑time decoding.

DFSMNDeep Learningacoustic modeling

0 likes · 17 min read

How DFSMN Sets a New Record in Speech Recognition Accuracy and Speed

Alibaba Cloud Developer

Mar 17, 2017 · Artificial Intelligence

How Improved Latency‑Controlled BLSTM Models Boost Online Speech Recognition Efficiency

This article explains how latency‑controlled BLSTM acoustic models were refined to accelerate online speech recognition while preserving accuracy, detailing the training strategy, computational trade‑offs, and two model enhancements that achieve up to 60% faster decoding with modest resource savings.

Deep LearningEfficiencyLC-BLSTM

0 likes · 6 min read

How Improved Latency‑Controlled BLSTM Models Boost Online Speech Recognition Efficiency