Artificial Intelligence 3 min read

How Alibaba’s DFSMN Model Pushes Speech Recognition Accuracy to 96.04%

Alibaba’s DAMO Academy unveiled the DFSMN speech‑recognition model, open‑sourced on GitHub, which sets a new 96.04% accuracy record on LibriSpeech, trains three times faster than LSTM, and powers real‑world demos like AI cashiers and metro ticket machines.

Alibaba Cloud Developer

Jun 8, 2018

How Alibaba’s DFSMN Model Pushes Speech Recognition Accuracy to 96.04%

Alibaba DAMO Academy’s Machine Intelligence Lab has released the next‑generation speech‑recognition model DFSMN, achieving a world‑record 96.04% accuracy on the LibriSpeech benchmark.

The model is open‑sourced on GitHub (https://github.com/tramphero/kaldi) and, compared with the widely used LSTM models, offers faster training and higher recognition accuracy. Devices using DFSMN can train three times faster and recognize speech twice as quickly.

At the recent Cloud Xi conference in Wuhan, an “AI cashier” equipped with DFSMN accurately handled voice orders in a noisy environment, processing 34 coffee orders in 49 seconds. The technology has also been deployed in Shanghai Metro ticket machines.

Professor Xie Lei, a leading speech‑recognition expert from Northwestern Polytechnical University, praised DFSMN as a breakthrough that significantly improves accuracy and represents one of the most impactful deep‑learning achievements in the field.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Alibaba AI deep learning DFSMN

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.