Artificial Intelligence 5 min read

JD.com Infrastructure Team Wins SIGIR 2018 E‑commerce Product Categorization Competition

The JD.com Infrastructure team won the SIGIR 2018 e‑commerce competition by developing a multi‑level deep‑learning model that tackled severe class imbalance, vocabulary drift, and hierarchical labeling, boosting F1 from 0.74 to 0.84 and earning paper acceptance at the SIGIR conference.

JD Retail Technology
JD Retail Technology
JD Retail Technology
JD.com Infrastructure Team Wins SIGIR 2018 E‑commerce Product Categorization Competition

In July, the JD.com Mall Infrastructure Department announced that its team won the SIGIR 2018 Global E‑commerce Competition, and the related competition paper has been accepted by SIGIR.

SIGIR is a premier international forum for information retrieval research, first held in 1978. The 40th edition featured an e‑commerce data competition that attracted teams from Walmart Labs, Flipkart, Carnegie Mellon University, Yahoo, and many leading universities worldwide.

The competition task was to predict the product category from a short product title.

The main challenges were:

Extreme class imbalance – some categories have tens of thousands of titles while others have only one or two.

Vocabulary mismatch between offline training data (≈220k unique words) and online test data (≈580k unique words), with many new words appearing only in the test set.

Complex hierarchical labels, ranging from three‑level "A>B>C" tags up to eight levels.

A large number of categories – up to 3,008 distinct classes, or about 1,600 when considering hierarchical levels.

The JD.com team (team name "tiger") addressed these issues by:

Merging training and test data to learn joint word embeddings, enabling semantic handling of unseen test‑set words.

Applying oversampling and data‑augmentation techniques to mitigate label imbalance.

Extracting a category tree from the training labels, splitting the data into eight hierarchical subsets, and training two different classifiers for each level, then using the tree to guide the prediction path.

Compared with the commonly used TextCNN baseline, their multi‑level deep‑learning model raised the F1 score from 0.74 to 0.84 – an improvement of about ten percentage points.

The associated paper, titled "Multi‑level Deep Learning based E‑commerce Product Categorization," has been accepted by SIGIR and is publicly listed.

The techniques demonstrated in the competition will soon be deployed in JD.com’s Category Cube system, providing an API for merchants and customers and delivering greater commercial value and improved user experience.

e-commercedeep learningSIGIRjd.commultilevel classificationproduct categorization
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.