Machine Learning Applications for Product Data Quality and Knowledge Graph Construction at JD.com
At the 2nd China Big Data International Summit 2017, JD’s chief architect presented how machine‑learning techniques are applied across e‑commerce to improve product data quality, ensure compliance, resolve image‑text mismatches, automate category identification, restructure titles, and build a multi‑dimensional product knowledge graph.
During the 2nd China Big Data International Summit 2017 in Shanghai, JD’s chief architect He Xiaofeng shared the company’s extensive use of machine‑learning methods to extract commercial value from massive product data.
The talk covered several key scenarios: using ML to clean and verify product information, detecting prohibited content with models for pornographic image detection, price OCR, semantic understanding of forbidden words, and adaptive QR‑code detection; applying fully convolutional networks for end‑to‑end image layout analysis to separate text from background.
To address inconsistencies between product images and textual attributes, JD built a text‑image mismatch verification system that leverages a curated attribute dictionary and supervised learning to align titles, sales attributes, and extended attributes, while also extracting visual features from product images for high‑confidence labeling.
For product titles, JD developed a title‑attribute understanding and re‑composition pipeline that reduces keyword stuffing, improves display completeness, and offers compliance services during upload.
Automatic category recognition was achieved by enhancing a CBOW‑based model (BTC) with dropout and training tricks, enabling the system to recommend correct categories from merchant‑provided titles, achieving a 99% classification accuracy during large‑scale promotional events.
JD also constructed a multi‑dimensional product knowledge graph by extracting information from product detail pages, user reviews, and customer service chats, using OCR to capture text from images, and applying supervised and unsupervised techniques to mine key phrases, sentiment, and functional attributes from reviews.
These machine‑learning applications collectively improve data quality, search accuracy, user experience, and enable the creation of a rich product knowledge ecosystem that drives commercial value for JD.com.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
