Artificial Intelligence 9 min read

How Tmall’s “Most Concerned” Feature Uses AI to Match Reviews with Consumer Questions

The article explains how Tmall’s new “Most Concerned” module leverages NLP techniques, fastText embeddings, Bi‑LSTM classifiers, and a custom clustering algorithm to filter, group, and link consumer questions with relevant product reviews, improving the shopping experience across many product categories.

Alibaba Cloud Developer

Jan 22, 2019

How Tmall’s “Most Concerned” Feature Uses AI to Match Reviews with Consumer Questions

Overview

Tmall’s mobile client recently launched a “Most Concerned” feature that, when users search for a product category (e.g., refrigerators), displays a module listing frequently asked questions such as “Is it noisy?” or “Does it consume a lot of power?”. Clicking a question shows detailed information and related product reviews.

Problem Statement

To build this module, several challenges must be solved:

Question Selection : Keep only generic questions applicable to a product category and discard item‑specific or vague queries.

Duplicate Question Merging : Consolidate semantically identical questions (e.g., “Is it noisy?” vs. “Is the noise loud?”) into a single representative.

Question‑Comment Association : Map each review to the questions it can answer, recognizing that a single review may address multiple questions or none at all.

Data Sources

tbods.s_macross_feed – contains all user‑submitted questions and answers from the “Ask Everyone” module.

search_kg.s_kg_all_comment_for_ha3 – stores all product comments.

Additional tables include tbcdm.dim_tb_itm (product catalog), search_ats.ali_seller_matrix_open_d (seller scores), and a category‑keyword dictionary.

Preprocessing

Noisy characters and punctuation are removed from questions; empty, invalid, or default comments are filtered out. Low‑frequency questions, low‑sales items, and low‑rating sellers are also excluded to improve data quality.

Algorithm

Word Embedding

FastText pretrained Chinese word vectors (trained on Wikipedia) are used as embeddings.

Question Filtering

A Bi‑LSTM encoder extracts sentence representations, followed by dropout and an MLP that predicts whether a question should be filtered. The model was trained on >5,000 manually labeled questions with >95% accuracy on a held‑out test set.

Question Clustering

A symmetric Bi‑LSTM‑based classifier determines if two questions share the same meaning. Using attention mechanisms (Luong attention) and a second‑layer Bi‑LSTM, the model outputs a probability of duplication. Over 10,000 question pairs were manually labeled for training and evaluation.

A custom clustering algorithm then processes questions in descending frequency order, assigning each to an existing cluster if it duplicates all members, creating a new cluster otherwise.

Question‑Comment Association

Because comments are longer than typical “Ask Everyone” answers, a rule‑based keyword matching approach is used to retrieve comments that answer a given question, favoring higher precision over recall.

Future Work

Deploy the module on the main Taobao app to reach more users.

Generate question‑relevant comments automatically to expand coverage.

Adopt advanced reading‑comprehension models such as BERT to improve accuracy.

References

【1】 https://fasttext.cc/docs/en/pretrained-vectors.html 【2】 https://www.kaggle.com/c/quora-question-pairs 【3】 https://arxiv.org/abs/1508.04025

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e-commerce Clustering AI NLP question answering review matching

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.