How Alibaba’s Hierarchical Attention Model Beat Humans on SQuAD 2018

Alibaba’s new hierarchical‑fusion attention model achieved an 82.44 % exact‑match score on the 2018 SQuAD benchmark, surpassing the human record and showcasing how large‑scale machine reading comprehension can be applied to real‑world e‑commerce services.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba’s Hierarchical Attention Model Beat Humans on SQuAD 2018

Alibaba’s AI Model Beats Humans on SQuAD 2018

On January 11, 2018, the Stanford‑run SQuAD competition recorded its first instance where a machine reading‑comprehension system outperformed human participants, with Alibaba’s model achieving an exact‑match accuracy of 82.44 % versus the human benchmark of 82.304 %.

Pranav Rajpurkar, the organizer of SQuAD, expressed excitement, noting that the Alibaba iDST team’s SLQA+ model was the first to surpass human performance on exact‑match scoring, while fuzzy‑match (F1) remains a challenge where humans still lead by about 2.5 points.

SQuAD provides a large‑scale dataset of 100 000 questions derived from over 500 Wikipedia articles. Systems read a short passage, answer multiple questions, and are evaluated by Exact Match and F1‑score.

The competition attracts top research groups such as Google, Carnegie Mellon, Stanford, Microsoft Research, Allen Institute, IBM, and Facebook.

Alibaba’s breakthrough stems from a “hierarchical fusion attention” deep neural network that mimics human reading strategies: it jointly considers the passage and the question, repeatedly scans the text, and uses a fusion of local and global attention to produce clear answer boundaries while avoiding over‑focus on irrelevant details.

Chief NLP scientist Si Luo highlighted that the technology now delivers strong results on factual Wikipedia QA and that the next goal is to achieve true “understanding and reasoning” for broader content.

Internally, Alibaba has deployed the model for real‑world services. During the Double 11 shopping festival, the “Ali Xiaomi” chatbot uses the model to interpret activity rules and answer user queries. It also powers intelligent product‑detail Q&A, reducing service costs and boosting conversion rates.

The NLP team’s AliNLP platform processes more than 1.2 trillion calls per day, and the Alitranx translation system serves over 700 million daily requests in 20 languages. The team has previously won first place in competitions such as ACM CIKM 2016, IJCNLP 2017, and the NIST TAC 2017 English entity classification task.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Hierarchical AttentionAlibaba AISQuAD
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.