How Alibaba’s DFSMN Model Pushes Speech Recognition Accuracy to 96.04%
Alibaba’s DAMO Academy unveiled the DFSMN speech‑recognition model, open‑sourced on GitHub, which sets a new 96.04% accuracy record on LibriSpeech, trains three times faster than LSTM, and powers real‑world demos like AI cashiers and metro ticket machines.
Alibaba DAMO Academy’s Machine Intelligence Lab has released the next‑generation speech‑recognition model DFSMN, achieving a world‑record 96.04% accuracy on the LibriSpeech benchmark.
The model is open‑sourced on GitHub (https://github.com/tramphero/kaldi) and, compared with the widely used LSTM models, offers faster training and higher recognition accuracy. Devices using DFSMN can train three times faster and recognize speech twice as quickly.
At the recent Cloud Xi conference in Wuhan, an “AI cashier” equipped with DFSMN accurately handled voice orders in a noisy environment, processing 34 coffee orders in 49 seconds. The technology has also been deployed in Shanghai Metro ticket machines.
Professor Xie Lei, a leading speech‑recognition expert from Northwestern Polytechnical University, praised DFSMN as a breakthrough that significantly improves accuracy and represents one of the most impactful deep‑learning achievements in the field.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
