Artificial Intelligence 8 min read

PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX

PP‑ShiTuV2, a PaddleX pipeline that integrates subject detection, deep feature encoding, and vector retrieval, delivers 91 % recall@1 on AliProducts, surpasses earlier models by over 20 points, runs efficiently on GPU and CPU, and offers simple installation, quick‑start code, and full fine‑tuning support.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX

Image recognition is a fundamental task in computer vision, widely used in face verification, retail product identification, etc. However, deploying such technology faces challenges such as frequent class updates, fine‑grained discrimination, data collection cost, and semantic gaps in open‑domain detection.

PP‑ShiTuV2, integrated in PaddleX, addresses these issues by combining three modules: a subject detection module that extracts all foreground objects, an image‑feature module that encodes detected subjects into deep feature vectors, and a vector‑retrieval module that matches vectors against a feature database.

The system achieves a recall@1 of 91.03 % on the AliProducts dataset and improves over the previous PP‑ShiTuV2_rec model by more than 20 percentage points on an internal open‑domain benchmark. Inference time is measured on NVIDIA Tesla T4 (FP32) and Intel Xeon Gold 5117 (8 threads, FP32).

Compared with the Grounding DINO model, PP‑ShiTuV2 shows superior performance on fine‑grained product and beverage brand recognition, as illustrated by the side‑by‑side visual results.

Installation

# cpu
python -m pip install paddlepaddle==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
# gpu (CUDA 11.8)
python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
# gpu (CUDA 12.3)
python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/whl/paddlex-3.0.0b2-py3-none-any.whl

Quick start

from paddlex import create_pipeline
pipeline = create_pipeline(pipeline="PP-ShiTuV2")
index_data = pipeline.build_index("drink_dataset_v2.0/", "drink_dataset_v2.0/gallery.txt")
output = pipeline.predict("./drink_dataset_v2.0/test_images/", index=index_data)
for res in output:
    res.print()
    res.save_to_img("./output/")

The demo uses the public drink_dataset_v2.0 (download link provided) to build an index and run predictions.

Fine‑tuning / secondary development

python main.py -c paddlex/configs/general_recognition/PP-ShiTuV2_rec.yaml \
    -o Global.mode=train \
    -o Global.dataset_dir=./dataset/Inshop_examples

Additional command‑line options allow specifying GPU devices (e.g., -o Global.device=gpu:0,1 ) and training epochs ( -o Train.epochs_iters=10 ). All hyper‑parameters can be edited in the YAML configuration file.

Overall, PP‑ShiTuV2 provides a ready‑to‑use, high‑performance pipeline for generic image recognition, suitable for both rapid prototyping and production deployment.

computer visionpythondeep learningModel DeploymentImage RecognitionPaddleXPP-ShiTuV2
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.