Artificial Intelligence 16 min read

UGC Sentiment Analysis Solutions and Applications in Taobao

This article presents a comprehensive overview of Taobao's user‑generated content (UGC) sentiment analysis pipeline, covering background, task definition, challenges, model architecture—including RoBERTa‑based extraction, sentiment‑knowledge pre‑training, and graph augmentation—personalized impression ranking, business impact cases, and future research directions.

DataFunSummit

Feb 28, 2022

UGC Sentiment Analysis Solutions and Applications in Taobao

Taobao generates massive daily user comments, making it difficult for consumers to browse all reviews; therefore, efficiently summarizing user opinions through UGC sentiment analysis is essential.

The UGC sentiment analysis task involves extracting product attributes, sentiment words, and their polarity (positive, negative, neutral) from comments, which can be used to generate aggregated viewpoints for users.

Key challenges include domain‑specific attribute variations, long‑tail expressions, imbalanced sentiment distributions, and cross‑domain polarity differences.

The proposed pipeline first feeds UGC paragraphs into an attribute‑and‑sentiment‑word extraction model based on a RoBERTa backbone fine‑tuned on e‑commerce data, followed by a BiLSTM layer, domain‑expert networks, attention‑based dynamic sharing, and a CRF layer to output (attribute, sentiment word, polarity) triples.

Extracted triples are normalized by completing implicit attributes, clustering semantically similar pairs, and then classified for sentiment polarity.

For online display, a viewpoint generation and aggregation module consolidates similar opinions into a unified view, while an active‑learning loop continuously improves the model with hard examples.

The sentiment‑knowledge‑enhanced pre‑training incorporates general sentiment lexicons and e‑commerce‑specific knowledge via additional embeddings and sentiment masking, improving downstream performance (macro‑F1 from 0.9306 to 0.9543).

A sentiment graph is constructed by linking similar attributes, sentiment words, and aspect‑sentiment pairs, providing knowledge augmentation that boosts performance on long‑tail and scarce negative cases.

Personalized impression ranking uses a DIN model that leverages user demographics, product features, interaction history, and extracted impression words to tailor displayed impression tags.

Business experiments show significant gains: impression tag pCTR ↑456%, UCTR ↑250%; search SRP IPV ↑0.55%, transaction volume ↑0.31%; various UI placements see PV and CTR improvements ranging from 0.16% to 2.7%.

Future work includes improving negative sentiment detection, handling complex multi‑aspect sentences, and developing end‑to‑end triple extraction models to reduce pipeline error accumulation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e‑commerce machine learning Sentiment Analysis UGC pretrained models aspect extraction

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.