Tagged articles
7 articles
Page 1 of 1
TAL Education Technology
TAL Education Technology
Feb 28, 2020 · Artificial Intelligence

TPNN Multi‑GPU Training and Mobile Optimization for Children's Acoustic Speech Recognition Models

This article describes the TPNN deep‑learning platform’s multi‑GPU acceleration, data‑parallel BMUF training, LSTM‑CTC acoustic modeling, and a suite of mobile‑side optimizations—including model pruning, 8‑bit quantization, low‑precision matrix multiplication and mixed‑precision computation—that together achieve over 92% recognition accuracy for children’s English speech on both server and mobile devices.

BMUFCTCDeep Learning
0 likes · 15 min read
TPNN Multi‑GPU Training and Mobile Optimization for Children's Acoustic Speech Recognition Models
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 27, 2018 · Artificial Intelligence

How Linear Networks Enable Speaker‑Adaptive Speech Synthesis with Minimal Data

This article presents a linear‑network‑based speaker‑adaptation method for text‑to‑speech that achieves synthesis quality comparable to large‑scale training using only a few hundred target‑speaker utterances, and introduces a low‑rank‑plus‑diagonal compression to improve stability with scarce data.

Artificial IntelligenceSpeech synthesisacoustic modeling
0 likes · 9 min read
How Linear Networks Enable Speaker‑Adaptive Speech Synthesis with Minimal Data
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 31, 2018 · Artificial Intelligence

How Deep‑FSMN and Low Frame Rate Accelerate Speech Recognition

This article introduces the Deep‑FSMN (DFSMN) architecture and its integration with low‑frame‑rate (LFR) processing, showing how the combined LFR‑DFSMN acoustic model achieves higher accuracy, smaller model size, faster training, and lower latency than traditional BLSTM‑based speech recognition systems on both English and Chinese large‑vocabulary tasks.

AIDFSMNacoustic modeling
0 likes · 12 min read
How Deep‑FSMN and Low Frame Rate Accelerate Speech Recognition
Meituan Technology Team
Meituan Technology Team
Oct 25, 2018 · Artificial Intelligence

Deep Learning System Design and Parallel Computing Solutions at Meituan

Meituan built a custom deep‑learning platform that combines data‑parallel and hybrid parallelism across multi‑GPU/cluster hardware, uses coarse‑grained scheduling and Kaldi‑derived acoustic algorithms, and supports fast NLU model hot‑updates, achieving near‑linear GPU scaling and 6–7× speedups over traditional solutions.

AI InfrastructureNLUSystem Architecture
0 likes · 13 min read
Deep Learning System Design and Parallel Computing Solutions at Meituan
Tencent Cloud Developer
Tencent Cloud Developer
Sep 26, 2018 · Artificial Intelligence

Breakthroughs in AI: Deep Learning Applications in Speech Recognition

The talk reviews how massive speech data, faster GPUs/CPUs, and deep‑learning models such as DNN, LSTM, CNN, and end‑to‑end CTC have dramatically boosted speech‑recognition accuracy, while outlining remaining challenges like noise, accents, far‑field and multi‑speaker scenarios and describing Tencent Cloud’s related services.

AINeural Networksacoustic modeling
0 likes · 16 min read
Breakthroughs in AI: Deep Learning Applications in Speech Recognition
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 8, 2018 · Artificial Intelligence

How DFSMN Sets a New Record in Speech Recognition Accuracy and Speed

Alibaba's DAMO Academy has open‑sourced the Deep‑Feedforward Sequential Memory Network (DFSMN), a next‑generation speech‑recognition model that achieves a world‑record 96.04% accuracy on LibriSpeech, trains three times faster than LSTM, halves model size, and dramatically speeds up real‑time decoding.

DFSMNDeep Learningacoustic modeling
0 likes · 17 min read
How DFSMN Sets a New Record in Speech Recognition Accuracy and Speed
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 17, 2017 · Artificial Intelligence

How Improved Latency‑Controlled BLSTM Models Boost Online Speech Recognition Efficiency

This article explains how latency‑controlled BLSTM acoustic models were refined to accelerate online speech recognition while preserving accuracy, detailing the training strategy, computational trade‑offs, and two model enhancements that achieve up to 60% faster decoding with modest resource savings.

Deep LearningLC-BLSTMacoustic modeling
0 likes · 6 min read
How Improved Latency‑Controlled BLSTM Models Boost Online Speech Recognition Efficiency