UGC Sentiment Analysis Solutions and Applications in Taobao
This article presents a comprehensive overview of Taobao's user‑generated content (UGC) sentiment analysis pipeline, covering background, task definition, challenges, model architecture—including RoBERTa‑based extraction, sentiment‑knowledge pre‑training, and graph augmentation—personalized impression ranking, business impact cases, and future research directions.
Taobao generates massive daily user comments, making it difficult for consumers to browse all reviews; therefore, efficiently summarizing user opinions through UGC sentiment analysis is essential.
The UGC sentiment analysis task involves extracting product attributes, sentiment words, and their polarity (positive, negative, neutral) from comments, which can be used to generate aggregated viewpoints for users.
Key challenges include domain‑specific attribute variations, long‑tail expressions, imbalanced sentiment distributions, and cross‑domain polarity differences.
The proposed pipeline first feeds UGC paragraphs into an attribute‑and‑sentiment‑word extraction model based on a RoBERTa backbone fine‑tuned on e‑commerce data, followed by a BiLSTM layer, domain‑expert networks, attention‑based dynamic sharing, and a CRF layer to output (attribute, sentiment word, polarity) triples.
Extracted triples are normalized by completing implicit attributes, clustering semantically similar pairs, and then classified for sentiment polarity.
For online display, a viewpoint generation and aggregation module consolidates similar opinions into a unified view, while an active‑learning loop continuously improves the model with hard examples.
The sentiment‑knowledge‑enhanced pre‑training incorporates general sentiment lexicons and e‑commerce‑specific knowledge via additional embeddings and sentiment masking, improving downstream performance (macro‑F1 from 0.9306 to 0.9543).
A sentiment graph is constructed by linking similar attributes, sentiment words, and aspect‑sentiment pairs, providing knowledge augmentation that boosts performance on long‑tail and scarce negative cases.
Personalized impression ranking uses a DIN model that leverages user demographics, product features, interaction history, and extracted impression words to tailor displayed impression tags.
Business experiments show significant gains: impression tag pCTR ↑456%, UCTR ↑250%; search SRP IPV ↑0.55%, transaction volume ↑0.31%; various UI placements see PV and CTR improvements ranging from 0.16% to 2.7%.
Future work includes improving negative sentiment detection, handling complex multi‑aspect sentences, and developing end‑to‑end triple extraction models to reduce pipeline error accumulation.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.