How Alibaba’s Enriched BERT Set a New Record in Open‑Domain QA
Alibaba’s AI team introduced a multi‑stage, document‑ranking and paragraph‑ranking system built on an Enriched BERT model that topped the MS MARCO reading‑comprehension leaderboard, surpassing previous state‑of‑the‑art methods and even human performance on open‑domain QA tasks.
Alibaba’s AI team set a new record in the MS MARCO text reading‑comprehension challenge, achieving top scores on both document‑ranking and open‑domain question answering, even surpassing human performance.
The approach uses cascade learning: at the first stage simple features and a ranking model filter out irrelevant documents and paragraphs, producing a set of candidate texts. These candidates are then processed by a deep, attention‑based multi‑document MRC model built on an Enriched BERT architecture, which extracts word‑level answer spans.
To further boost performance, the model incorporates document‑extraction and paragraph‑extraction as auxiliary tasks, sharing the same underlying Enriched BERT language model across all three tasks, enabling a coarse‑to‑fine inference process and iterative learning.
The system architecture consists of three core modules—document retrieval, paragraph retrieval, and answer extraction—each with its own ranking function (for efficiency) and extraction function (for effectiveness). The ranking functions filter out noise, while the extraction functions are jointly optimized with the final answer‑extraction module.
Experiments on the TriviaQAWeb and DuReader benchmarks show that the proposed model outperforms previous state‑of‑the‑art methods across all evaluated scenarios, achieving the best performance when using two paragraphs from four documents. The auxiliary tasks also reduce the time cost of early ranking without noticeably harming answer quality.
Online testing with Alibaba’s “XiaoMi” customer‑service chatbot, which handles roughly two million daily queries, demonstrated response times under 50 ms and a significant boost in effectiveness metrics.
Overall, by eliminating irrelevant “noise,” the model markedly improves the standards of existing online QA systems while maintaining a balanced trade‑off between efficiency and effectiveness at each stage of the extraction pipeline.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
