FastDeploy: One-Click AI Model Deployment Across GPUs, CPUs, and Edge Devices
FastDeploy is an open‑source toolkit that standardizes AI model APIs and enables developers to deploy vision, NLP, and speech models on diverse hardware—including GPUs, CPUs, Jetson, ARM, and various NPUs—using just three lines of code or a single command, while delivering end‑to‑end performance optimizations.
Overview
FastDeploy is an open‑source toolkit that standardizes AI model APIs and provides ready‑to‑run demos for a wide range of hardware and deployment scenarios. It supports both online (service) and offline modes.
Supported Scenarios and Hardware
FastDeploy runs on NVIDIA GPUs, x86 CPUs, Jetson devices, ARM CPUs, Rockchip NPU, Amlogic NPU, NXP NPU and other platforms. It covers computer‑vision, natural‑language‑processing and speech tasks, including image classification, object detection, OCR, face recognition, pose estimation, text classification and speech synthesis.
Inference back‑ends integrated include Paddle Inference, TensorRT, OpenVINO, ONNX Runtime, Paddle Lite and RKNN, allowing a single codebase to target multiple accelerators. Models from Paddle suites (PaddleClas, PaddleDetection, PaddleSeg, PaddleOCR, PaddleNLP, PaddleSpeech) as well as PyTorch and ONNX models are supported.
Simple Python API
Model deployment typically requires three lines of Python code. Example for a Paddle model (PP‑YOLOE) and an ONNX model (YOLOv7):
import fastdeploy as fd
import cv2
# PP‑YOLOE (Paddle) model
model = fd.vision.detection.PPYOLOE(
"model.pdmodel", "model.pdiparams", "infer_cfg.yml")
img = cv2.imread("test.jpg")
result = model.predict(img)
# YOLOv7 (ONNX) model
model = fd.vision.detection.YOLOv7("model.onnx")
img = cv2.imread("test.jpg")
result = model.predict(img)Switching the runtime backend is performed by configuring a RuntimeOption object:
option = fd.RuntimeOption()
option.use_cpu() # select CPU
option.use_openvino_backend() # select OpenVINO backend
model = fd.vision.detection.PPYOLOE(
"model.pdmodel", "model.pdiparams", "infer_cfg.yml",
runtime_option=option)Performance Optimizations
FastDeploy optimizes the full inference pipeline, including pre‑ and post‑processing, and provides one‑click model compression with negligible accuracy loss. CUDA‑accelerated preprocessing and high‑performance back‑ends can reduce YOLO series latency from ~41 ms to ~25 ms on a GPU.
Quick‑Start Guides
1. CPU/GPU Deployment (YOLOv7)
Install the GPU package, clone the repository and download a sample model and image:
pip install fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy/examples/vision/detection/yolov7/python/
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov7.onnx
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpgRun inference on different devices:
# CPU inference
python infer.py --model yolov7.onnx --image 000000014439.jpg --device cpu
# GPU inference
python infer.py --model yolov7.onnx --image 000000014439.jpg --device gpu
# GPU with TensorRT acceleration
python infer.py --model yolov7.onnx --image 000000014439.jpg --device gpu --use_trt True2. Jetson Deployment (YOLOv7)
Build FastDeploy for Jetson, then compile and run the C++ demo:
git clone https://github.com/PaddlePaddle/FastDeploy && cd FastDeploy
mkdir build && cd build
cmake .. -DBUILD_ON_JETSON=ON -DENABLE_VISION=ON -DCMAKE_INSTALL_PREFIX=${PWD}/install
make -j8 && make install
source fastdeploy_init.sh
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov7.onnx
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
cd examples/vision/detection/yolov7/cpp
cmake .. -DFASTDEPLOY_INSTALL_DIR=${FASTDEPLOY_DIR}
mkdir build && cd build
make -j
./infer_demo yolov7.onnx 000000014439.jpg3. RK3588 Deployment (PicoDet)
Convert a Paddle static graph to ONNX, optimize it, export to RKNN and run inference:
# Clone FastDeploy
git clone https://github.com/PaddlePaddle/FastDeploy.git
# Download example assets
wget https://bj.bcebos.com/fastdeploy/models/rknn2/picodet_s_416_coco_npu.zip
unzip -q picodet_s_416_coco_npu.zip
# Convert Paddle model to ONNX
paddle2onnx --model_dir picodet_s_416_coco_npu \
--model_filename model.pdmodel \
--params_filename model.pdiparams \
--save_file picodet_s_416_coco_npu.onnx \
--enable_dev_version True
# Optimize ONNX model
python -m paddle2onnx.optimize --input_model picodet_s_416_coco_npu.onnx \
--output_model picodet_s_416_coco_npu.onnx \
--input_shape_dict "{'image':[1,3,416,416]}"
# Export to RKNN
python tools/rknpu2/export.py --config_path tools/rknpu2/config/RK3588/picodet_s_416_coco_npu.yaml
# Run inference
python3 infer.py --model_file ./picodet_3588/picodet_3588.rknn \
--config_file ./picodet_3588/deploy.yaml \
--image images/000000014439.jpgRepository
Source code, documentation and additional benchmarks are available at:
https://github.com/PaddlePaddle/FastDeploy
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
