How Alibaba’s Hierarchical Attention Model Beat Humans on SQuAD 2018
Alibaba’s new hierarchical‑fusion attention model achieved an 82.44 % exact‑match score on the 2018 SQuAD benchmark, surpassing the human record and showcasing how large‑scale machine reading comprehension can be applied to real‑world e‑commerce services.
Alibaba’s AI Model Beats Humans on SQuAD 2018
On January 11, 2018, the Stanford‑run SQuAD competition recorded its first instance where a machine reading‑comprehension system outperformed human participants, with Alibaba’s model achieving an exact‑match accuracy of 82.44 % versus the human benchmark of 82.304 %.
Pranav Rajpurkar, the organizer of SQuAD, expressed excitement, noting that the Alibaba iDST team’s SLQA+ model was the first to surpass human performance on exact‑match scoring, while fuzzy‑match (F1) remains a challenge where humans still lead by about 2.5 points.
SQuAD provides a large‑scale dataset of 100 000 questions derived from over 500 Wikipedia articles. Systems read a short passage, answer multiple questions, and are evaluated by Exact Match and F1‑score.
The competition attracts top research groups such as Google, Carnegie Mellon, Stanford, Microsoft Research, Allen Institute, IBM, and Facebook.
Alibaba’s breakthrough stems from a “hierarchical fusion attention” deep neural network that mimics human reading strategies: it jointly considers the passage and the question, repeatedly scans the text, and uses a fusion of local and global attention to produce clear answer boundaries while avoiding over‑focus on irrelevant details.
Chief NLP scientist Si Luo highlighted that the technology now delivers strong results on factual Wikipedia QA and that the next goal is to achieve true “understanding and reasoning” for broader content.
Internally, Alibaba has deployed the model for real‑world services. During the Double 11 shopping festival, the “Ali Xiaomi” chatbot uses the model to interpret activity rules and answer user queries. It also powers intelligent product‑detail Q&A, reducing service costs and boosting conversion rates.
The NLP team’s AliNLP platform processes more than 1.2 trillion calls per day, and the Alitranx translation system serves over 700 million daily requests in 20 languages. The team has previously won first place in competitions such as ACM CIKM 2016, IJCNLP 2017, and the NIST TAC 2017 English entity classification task.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
