How Alibaba’s AI Powers Machine Reading Comprehension in E‑Commerce

Alibaba’s AI assistant “Ali Xiaomì” is exploring machine reading comprehension to automatically understand e‑commerce rules and product information, leveraging deep learning models and datasets such as SQuAD, bAbI, and MCTest, while addressing challenges of long texts, answer granularity, and real‑world deployment.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba’s AI Powers Machine Reading Comprehension in E‑Commerce

Research Background

Ali Xiaomì is Alibaba's intelligent human‑machine interaction product for e‑commerce services, handling 6.43 million smart service requests during Double‑11 2020 with a 95% intelligent resolution rate. Its QA technology has evolved from retrieval‑based knowledge bases to semantic deep modeling.

Recently, Ali Xiaomì is exploring machine reading comprehension (MRC) to give the QA product true intelligence and improve service efficiency.

E‑Commerce Application Scenarios

Transaction Rule Interpretation

During large events like Double‑11, users ask many questions about activity rules. Traditionally, knowledge operators manually extract possible questions from rule documents. MRC can directly read the rules and provide natural answers.

Example rule excerpt and Q&A about automatic order confirmation.

Pre‑sale Product Consultation

Store Xiaomì serves millions of merchants; shoppers often ask detailed product questions that are already described in product detail pages. MRC can read these pages and answer intelligently, reducing service cost and increasing conversion.

Related Work Survey

Knowledge‑Base‑Based MRC

Traditional NLP pipelines extract entities and attributes, build a knowledge graph, then retrieve answers. This involves entity detection, linking, attribute filling, and knowledge retrieval, but suffers from error propagation and domain specificity.

End‑to‑End MRC

Recent public datasets have driven rapid progress in end‑to‑end MRC. Representative datasets include:

Facebook bAbI reasoning dataset

Microsoft MCTest multiple‑choice dataset

DeepMind CNN/DailyMail cloze dataset

Facebook CBT cloze dataset

iFlytek & HIT Chinese cloze dataset

Stanford SQuAD span‑answer dataset

These datasets vary in size, language, and task type, providing benchmarks for various MRC models.

Models Based on SQuAD

Match‑LSTM with Answer Pointer – early SQuAD model using a pointer network to predict answer boundaries.

Bidirectional Attention Flow (BiDAF) – introduces bidirectional attention between question and context.

FastQAExt – lightweight model with binary and weighted word‑in‑question features.

R‑Net – state‑of‑the‑art model with dual interaction layers.

Each model consists of embedding, encoding, interaction, and answer layers, often employing LSTM/bi‑LSTM, attention mechanisms, and pointer networks.

Challenges and Practices in Business Scenarios

Chinese Dataset Construction – Build high‑quality annotated data; supplement scarce public Chinese datasets via large‑scale translation.

Model Business Optimization – Incorporate document structure (headings, hierarchy) into inputs to improve training.

Model Simplification – Reduce complex architectures (e.g., bi‑LSTM) for online latency while controlling performance loss.

Model Fusion – Combine deep learning with traditional methods to balance intelligence and controllability.

Conclusion

Machine reading comprehension is a hot NLP task with rapid advances, especially models built on SQuAD. While they achieve strong results on Wikipedia‑style QA, real‑world e‑commerce questions remain more complex. Integrating academic research with industrial needs, as Alibaba does, can create valuable intelligent services.

References

Weston et al., 2015. Towards AI‑Complete Question Answering.

Richardson et al., 2013. MCTest.

Hermann et al., 2015. Teaching Machines to Read and Comprehend.

Hill et al., 2015. The Goldilocks Principle.

iFlytek & HIT Chinese RC dataset.

Rajpurkar et al., 2016. SQuAD.

Wang et al., 2016. Machine Comprehension Using Match‑LSTM and Answer Pointer.

Seo et al., 2016. Bidirectional Attention Flow.

Weissenborn et al., 2017. Making Neural QA as Simple as Possible but not Simpler.

Wang et al., 2017. Gated Self‑Matching Networks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Deep Learningnatural language processingE-commerce AIquestion answeringmachine reading comprehension
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.