Artificial Intelligence 10 min read

Designing a Scalable Image Classification System for Prohibited Item Detection in a Second‑hand E‑commerce Platform

This article describes how a second‑hand e‑commerce company built a fast, modular image‑classification pipeline using small binary classifiers, efficientNet‑b0, and active‑learning‑driven data annotation to detect prohibited items while keeping inference latency under 200 ms and reducing labeling costs dramatically.

Zhuanzhuan Tech
Zhuanzhuan Tech
Zhuanzhuan Tech
Designing a Scalable Image Classification System for Prohibited Item Detection in a Second‑hand E‑commerce Platform

1. Introduction

Zhuanzhuan, a second‑hand e‑commerce platform, needs to ensure that user‑uploaded product listings do not contain prohibited content such as cash, weapons, or cigarettes. Early detection relied on text keywords, but text can be easily altered, prompting a shift to image‑based detection. Dozens of prohibited categories must be recognized, and the system must support rapid iteration, low cost, and flexible addition of new classes.

2. Business Challenges

Unlike binary content moderation (e.g., porn vs. non‑porn), prohibited‑item detection involves many mutable categories that product teams may add at any time. The massive daily influx of images leads to high false‑positive rates, making precision improvement difficult. The main pain points are:

Improving recall/precision for a specific class must not degrade other classes. Adding a new class traditionally requires re‑labeling the entire historical dataset (≈200 k images), which is labor‑intensive. Complex online scenarios cause high false‑positive rates, hindering accuracy optimization.

These challenges motivate the solution described below.

3. Solution

3.1 Overall Model Architecture Design

Instead of a single multi‑class model, we train an independent binary classifier for each prohibited category. At inference time, all small models are merged into a single large model; because parameters are not shared, updating one class does not affect others, and new classes can be added without re‑labeling historic data.

Two new issues arise:

Will the lightweight models achieve comparable performance to the original large multi‑class model? As the number of classes grows, the combined model’s inference time and size increase.

We compared EfficientNet‑b0 (small model) with InceptionResNetV2 (original large model). EfficientNet‑b0 matched or exceeded performance because each binary model focuses on a single class.

We also measured latency and model size for different numbers of integrated small models (batch size = 5, input = 300×300, GPU = TITAN V):

Model Number of Integrated Models Latency Model Size InceptionResNetV2 1 115 ms 219 MB EfficientNet‑b0 12 132 ms 198 MB EfficientNet‑b0 20 208 ms 330 MB EfficientNet‑b0 30 302 ms 494 MB EfficientNet‑b0 40 484 ms 659 MB

With 12 classes the latency and size are comparable to the original model. To keep latency under 200 ms we limit the integrated model to ≤20 classes, which fits within our resource budget. The strategy flow is shown below:

3.2 Data Annotation Strategy

Detecting prohibited items in a massive daily stream requires high precision; otherwise human reviewers are overwhelmed. Initially we labeled ~200 k negative samples to achieve the needed accuracy, which was costly and slowed iteration. We adopted an active‑learning pipeline to drastically reduce the number of required negative samples.

Active learning, a sub‑field of machine learning (also known as query learning or optimal experimental design), iteratively selects the most informative unlabeled samples for annotation.

The workflow is:

Annotators label a small initial set (≈10 k negatives and many positives). Train a binary classifier. Evaluate accuracy; if insufficient, continue. Use the model to score unlabeled images, select those with prediction confidence around 0.5 (high uncertainty) as high‑value samples. Annotators label these high‑value samples and add them to the training set. Repeat steps 2‑5 until performance meets requirements.

With this loop we reduced the required negative samples from ~200 k to ~20 k—a ten‑fold reduction—saving labeling effort and shortening model development cycles.

4. Conclusion and Outlook

The presented architecture and active‑learning‑driven annotation pipeline enable fast, high‑quality responses to business needs for prohibited‑item detection. Future work will focus on handling a larger number of categories and exploring open‑set classification techniques.

Jia Yunlong, Senior Algorithm Engineer, responsible for risk‑control algorithms at Zhuanzhuan.

e-commerceimage classificationAIModel Architectureactive learningprohibited items
Zhuanzhuan Tech
Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.