Artificial Intelligence 18 min read

Text Mining for User Research: Architecture, Labeling, and Application Cases at JD.com

The presentation explains how JD.com leverages large‑scale text mining and NLP techniques—including data cleaning, multi‑level labeling, sentiment classification with models such as TextCNN, RoBERTa, and USE—to transform unstructured customer feedback into actionable product insights across various e‑commerce scenarios.

DataFunTalk
DataFunTalk
DataFunTalk
Text Mining for User Research: Architecture, Labeling, and Application Cases at JD.com

The talk begins by emphasizing the importance of customer opinions for market success, noting that traditional behavior analysis cannot capture subjective feedback and that unstructured text now accounts for about 80% of data, presenting a huge opportunity for insight extraction.

JD.com processes billions of daily user interactions, extracting millions of reviews, queries, and service messages. The proposed pipeline first cleans and tokenizes raw text, splits long comments into short sentences, and incorporates business‑specific knowledge bases to enrich labeling.

Sentiment is classified into neutral, positive, and negative. Negative feedback is further divided into service‑related issues (handled with a TextCNN multi‑class model and USE similarity) and product‑related issues (handled with a RoBERTa model and USE similarity), producing a hierarchical label structure up to five levels of granularity.

The labeling workflow involves two rounds: an initial multi‑class annotation to define attribute categories, followed by a binary verification step, with USE‑based similarity and clustering used to expand and refine tags, improving efficiency and addressing data imbalance.

Model performance is evaluated on standardized test sets to automate accuracy and recall calculations, reducing manual effort. The resulting structured insights are productized for business users, enabling scenario‑based analyses such as NPS factor decomposition, demand insight through user segmentation, and search‑term gap analysis.

Case studies include a headphone/earphone pilot where attribute‑level sentiment scores identified quality issues, leading to packaging redesign and a 27% NPS increase, and an SSD example where adding screws as a gift reduced complaints about installation.

The discussion also references the Kano model for demand classification and illustrates how combining user reviews and search data can reveal high‑interest, low‑satisfaction product areas, guiding strategic product development.

Finally, the speaker highlights that the value of data depends on three pillars—data volume, algorithmic capability, and application scenarios—and that deep learning must be integrated with business logic and decision‑making to fully realize its potential.

e-commerceBig DataAIsentiment analysisNLPtext miningUser Research
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.