Artificial Intelligence 13 min read

How Machine Learning Can Clean Up Low‑Quality E‑Commerce Product Materials

This article explains a machine‑learning‑driven system that automatically detects and classifies poor‑quality e‑commerce product materials—such as misleading titles, exaggerated benefits, and over‑promotion—to protect consumers, reduce platform risk, and improve conversion rates during major sales events.

Alibaba Cloud Developer

Jan 16, 2019

How Machine Learning Can Clean Up Low‑Quality E‑Commerce Product Materials

Project Overview

Product material—including trademarks, images, slogans, and promotional text—is a key reference for consumers and a decisive factor for conversion. Because of the massive number of items on the platform, low‑quality material can mislead shoppers and damage the experience. The project aims to combat over‑marketing and false benefits by leveraging machine learning for material governance.

Impact

Low‑quality material harms consumer browsing and purchasing, and also creates legal and reputational risks for the platform. The initiative enforces a zero‑tolerance policy, pushes merchants to correct material, and removes non‑compliant items to safeguard user experience and reduce platform risk.

Classification of Poor‑Quality Material

Generally, poor‑quality material falls into three categories:

Abnormal Short Titles : Titles that are unusually short (6‑10 characters) and contain misleading or click‑bait content.

Abnormal Benefit Points : Benefit statements that are nonsensical, overly exaggerated, or contain false discounts.

Over‑Promotion : Claims of discounts (e.g., “full‑store 50% off”) that are not actually applicable to the merchant.

Solution Overview

The solution consists of three stages: challenge identification, feature extraction, and model‑based classification. Business teams define the types of poor‑quality material, and feature engineering extracts relevant characteristics for detection.

Technical Scheme

The system is composed of an input layer, data‑preprocessing layer, model layer, result layer, and processing layer.

Input Layer : Automatically selects data ingestion methods (message sync, DB sync, ODPS sync, OCR for images) based on source.

Data Preprocessing Layer : Uses NLP to tokenize titles and benefit points, then performs part‑of‑speech tagging.

Model Layer (Recognition) : Includes edit‑distance similarity, cosine similarity, and TF‑IDF models.

Model Layer (Classification) : Employs longest common substring, longest common subsequence, and FastText models.

Result Layer : Labels identified items as abnormal short titles, benefit points, promotional slogans, etc.

Processing Layer : For offline results, triggers alerts and pushes merchants to modify material; for online results, intercepts material in real time without exposing specific reasons.

Algorithm Models

TF‑IDF Model

Transforms text into word vectors based on term frequency‑inverse document frequency, enabling similarity calculation via cosine similarity.

Cosine similarity is computed as the cosine of the angle between two vectors; values close to 1 indicate high similarity.

FastText Model

FastText takes a sequence of words as input and outputs the probability of belonging to each class. Training samples are labeled, the model learns feature vectors, and then predicts categories for new material.

Edit Distance Similarity

Edit distance measures the minimum number of insertions, deletions, or substitutions required to transform one string into another. Similarity is calculated as 1 − (edit distance / max(length1, length2)).

If i = 0 and j = 0, edit(i,j)=0.

If i = 0 and j > 0, edit(i,j)=j.

If i > 0 and j = 0, edit(i,j)=i.

If i ≥ 1 and j ≥ 1, edit(i,j)=min{edit(i‑1,j)+1, edit(i,j‑1)+1, edit(i‑1,j‑1)+f(i,j)}.

Recognition Results

The system successfully identifies abnormal short titles and benefit points, classifying them into sub‑categories such as low‑quality titles, promotional slogans, and over‑promotion.

Future Outlook

Machine intelligence will continue to empower material governance. Future work includes generative models, reinforcement learning, machine reading, and sentiment analysis to further automate detection, real‑time interception, and improve consumer experience while ensuring merchants fill material only once.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e-commerce machine learning AI TF-IDF content moderation fastText

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.