PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX
PP‑ShiTuV2, a PaddleX pipeline that integrates subject detection, deep feature encoding, and vector retrieval, delivers 91 % recall@1 on AliProducts, surpasses earlier models by over 20 points, runs efficiently on GPU and CPU, and offers simple installation, quick‑start code, and full fine‑tuning support.
Image recognition is a fundamental task in computer vision, widely used in face verification, retail product identification, etc. However, deploying such technology faces challenges such as frequent class updates, fine‑grained discrimination, data collection cost, and semantic gaps in open‑domain detection.
PP‑ShiTuV2, integrated in PaddleX, addresses these issues by combining three modules: a subject detection module that extracts all foreground objects, an image‑feature module that encodes detected subjects into deep feature vectors, and a vector‑retrieval module that matches vectors against a feature database.
The system achieves a recall@1 of 91.03 % on the AliProducts dataset and improves over the previous PP‑ShiTuV2_rec model by more than 20 percentage points on an internal open‑domain benchmark. Inference time is measured on NVIDIA Tesla T4 (FP32) and Intel Xeon Gold 5117 (8 threads, FP32).
Compared with the Grounding DINO model, PP‑ShiTuV2 shows superior performance on fine‑grained product and beverage brand recognition, as illustrated by the side‑by‑side visual results.
Installation
# cpu
python -m pip install paddlepaddle==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
# gpu (CUDA 11.8)
python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
# gpu (CUDA 12.3)
python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/ pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/whl/paddlex-3.0.0b2-py3-none-any.whlQuick start
from paddlex import create_pipeline
pipeline = create_pipeline(pipeline="PP-ShiTuV2")
index_data = pipeline.build_index("drink_dataset_v2.0/", "drink_dataset_v2.0/gallery.txt")
output = pipeline.predict("./drink_dataset_v2.0/test_images/", index=index_data)
for res in output:
res.print()
res.save_to_img("./output/")The demo uses the public drink_dataset_v2.0 (download link provided) to build an index and run predictions.
Fine‑tuning / secondary development
python main.py -c paddlex/configs/general_recognition/PP-ShiTuV2_rec.yaml \
-o Global.mode=train \
-o Global.dataset_dir=./dataset/Inshop_examplesAdditional command‑line options allow specifying GPU devices (e.g., -o Global.device=gpu:0,1 ) and training epochs ( -o Train.epochs_iters=10 ). All hyper‑parameters can be edited in the YAML configuration file.
Overall, PP‑ShiTuV2 provides a ready‑to‑use, high‑performance pipeline for generic image recognition, suitable for both rapid prototyping and production deployment.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.