How Alibaba’s AI Powers Machine Reading Comprehension in E‑Commerce
Alibaba’s AI assistant “Ali Xiaomì” is exploring machine reading comprehension to automatically understand e‑commerce rules and product information, leveraging deep learning models and datasets such as SQuAD, bAbI, and MCTest, while addressing challenges of long texts, answer granularity, and real‑world deployment.
Research Background
Ali Xiaomì is Alibaba's intelligent human‑machine interaction product for e‑commerce services, handling 6.43 million smart service requests during Double‑11 2020 with a 95% intelligent resolution rate. Its QA technology has evolved from retrieval‑based knowledge bases to semantic deep modeling.
Recently, Ali Xiaomì is exploring machine reading comprehension (MRC) to give the QA product true intelligence and improve service efficiency.
E‑Commerce Application Scenarios
Transaction Rule Interpretation
During large events like Double‑11, users ask many questions about activity rules. Traditionally, knowledge operators manually extract possible questions from rule documents. MRC can directly read the rules and provide natural answers.
Example rule excerpt and Q&A about automatic order confirmation.
Pre‑sale Product Consultation
Store Xiaomì serves millions of merchants; shoppers often ask detailed product questions that are already described in product detail pages. MRC can read these pages and answer intelligently, reducing service cost and increasing conversion.
Related Work Survey
Knowledge‑Base‑Based MRC
Traditional NLP pipelines extract entities and attributes, build a knowledge graph, then retrieve answers. This involves entity detection, linking, attribute filling, and knowledge retrieval, but suffers from error propagation and domain specificity.
End‑to‑End MRC
Recent public datasets have driven rapid progress in end‑to‑end MRC. Representative datasets include:
Facebook bAbI reasoning dataset
Microsoft MCTest multiple‑choice dataset
DeepMind CNN/DailyMail cloze dataset
Facebook CBT cloze dataset
iFlytek & HIT Chinese cloze dataset
Stanford SQuAD span‑answer dataset
These datasets vary in size, language, and task type, providing benchmarks for various MRC models.
Models Based on SQuAD
Match‑LSTM with Answer Pointer – early SQuAD model using a pointer network to predict answer boundaries.
Bidirectional Attention Flow (BiDAF) – introduces bidirectional attention between question and context.
FastQAExt – lightweight model with binary and weighted word‑in‑question features.
R‑Net – state‑of‑the‑art model with dual interaction layers.
Each model consists of embedding, encoding, interaction, and answer layers, often employing LSTM/bi‑LSTM, attention mechanisms, and pointer networks.
Challenges and Practices in Business Scenarios
Chinese Dataset Construction – Build high‑quality annotated data; supplement scarce public Chinese datasets via large‑scale translation.
Model Business Optimization – Incorporate document structure (headings, hierarchy) into inputs to improve training.
Model Simplification – Reduce complex architectures (e.g., bi‑LSTM) for online latency while controlling performance loss.
Model Fusion – Combine deep learning with traditional methods to balance intelligence and controllability.
Conclusion
Machine reading comprehension is a hot NLP task with rapid advances, especially models built on SQuAD. While they achieve strong results on Wikipedia‑style QA, real‑world e‑commerce questions remain more complex. Integrating academic research with industrial needs, as Alibaba does, can create valuable intelligent services.
References
Weston et al., 2015. Towards AI‑Complete Question Answering.
Richardson et al., 2013. MCTest.
Hermann et al., 2015. Teaching Machines to Read and Comprehend.
Hill et al., 2015. The Goldilocks Principle.
iFlytek & HIT Chinese RC dataset.
Rajpurkar et al., 2016. SQuAD.
Wang et al., 2016. Machine Comprehension Using Match‑LSTM and Answer Pointer.
Seo et al., 2016. Bidirectional Attention Flow.
Weissenborn et al., 2017. Making Neural QA as Simple as Possible but not Simpler.
Wang et al., 2017. Gated Self‑Matching Networks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
