Artificial Intelligence 17 min read

Multi‑Label Image Recognition for 58.com: Algorithm Design, Data Construction, and Model Optimization

This article presents a comprehensive study of multi‑label image recognition applied to 58.com’s business scenarios, covering problem motivation, dataset construction, evaluation metrics, mainstream deep‑learning methods, an asymmetric‑loss‑based optimization pipeline, and practical output schemes for recommendation and retrieval.

58 Tech

Jun 9, 2022

Multi‑Label Image Recognition for 58.com: Algorithm Design, Data Construction, and Model Optimization

Image recognition is a fundamental task in computer vision, but most traditional algorithms can only detect a single object per image, limiting their usefulness when multiple objects appear. Multi‑label recognition addresses this gap by predicting all present categories, enabling richer semantic understanding for downstream services such as recommendation, advertising, and content moderation.

Business background : 58.com processes billions of images daily across categories like local services, real estate, recruitment, and automotive. These images contain many semantically similar items, and extracting multi‑label information can improve post recommendation, ad placement, and illegal content detection.

Technical background : Multi‑label methods are divided into localization‑based (detection/segmentation) and classification‑based approaches. Classification methods avoid costly bounding‑box annotation and scale better to large label spaces.

Typical open datasets : MS‑COCO (80 classes, 200k+ annotations), PASCAL VOC2012 (20 classes), Open Images V6 (19,958 classes, 9.17M annotations), NUS‑WIDE (5,018 classes). These datasets provide benchmarks for evaluating multi‑label models.

Evaluation metrics : Mean Average Precision (mAP) – both macro and micro variants – and Hamming Loss are used to assess classification quality and label‑wise errors.

State‑of‑the‑art algorithms :

CNN+LSTM (e.g., CVPR 2018 RLSD) encodes region proposals and models inter‑region dependencies via LSTM.

Cross‑modality attention with semantic graph embedding (AAAI 2020) fuses CNN features with class embeddings to capture label correlations.

Graph Convolutional Networks (CVPR 2019 ML‑GCN) learn label‑specific classifiers using a GCN over a label co‑occurrence graph.

Loss‑function improvements such as Asymmetric Loss (Alibaba 2020) address the extreme imbalance between positive and negative labels.

Technical solution for 58.com :

Data construction : Use a pre‑trained Open Images model to generate initial multi‑label tags, filter rare tags (<0.05% frequency), remove low‑co‑occurrence tags via a co‑occurrence matrix, and merge semantically similar tags using WordNet, resulting in a ~720‑class taxonomy covering 17 domains.

Model optimization : Replace the final Softmax with a sigmoid layer for multi‑label output and adopt Asymmetric Loss (ASL) to re‑weight positive/negative samples. Switch backbone from ResNet‑101 to ResNeXt‑50 and use Global Max Pooling instead of Global Avg Pooling to improve inference speed with minimal mAP loss.

Output schemes : Provide two outputs – (1) a set of predicted label IDs for recommendation and similarity‑based retrieval, and (2) a 2048‑dimensional feature embedding for nearest‑neighbor search and anomaly detection.

Extensive experiments on internal 58.com data (500 M+ images) and public benchmarks (MS‑COCO, PASCAL‑VOC) demonstrate that the proposed pipeline achieves higher mAP than baseline methods while reducing training memory and inference latency. Visualizations show effective activation regions and successful top‑k retrieval examples.

Conclusion and outlook : The customized multi‑label solution efficiently builds large‑scale annotated data, trains fast and accurate models, and offers flexible output formats for various business scenarios. Future work includes integrating additional modalities, refining feature encodings, and expanding applications to recommendation, anomaly detection, and advertising.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision graph convolutional networks multi-label classification image retrieval data annotation asymmetric loss

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.