Sep 1, 2025 · Artificial Intelligence

How We Cut ERNIE Model Resource Use by 75% with Pruning, Structured Slimming, and ONNX Runtime

In this detailed engineering guide we diagnose a heavyweight ERNIE‑Base text‑classification service consuming 128 CPU cores and 96 GB RAM, then apply a three‑step optimization—model selection, structured pruning with PaddleSlim, and engine migration to ONNX Runtime—achieving a 75% reduction in resource usage while keeping recall above 99.5% and boosting inference speed by over 20%.

AI model optimizationONNX RuntimePaddleSlim

0 likes · 11 min read

How We Cut ERNIE Model Resource Use by 75% with Pruning, Structured Slimming, and ONNX Runtime

Baidu Geek Talk

Apr 1, 2022 · Artificial Intelligence

How Paddle Lite & PaddleSlim Supercharge Edge AI Inference Performance

With the rapid rise of edge computing, deploying AI models for tasks like object detection, OCR, and speech recognition on resource‑constrained devices faces speed challenges; the upgraded Paddle Lite inference engine and PaddleSlim compression tools claim up to 23% faster inference and significant model size reductions, offering a practical solution.

AI deploymentEdge AIPaddle-Lite

0 likes · 6 min read

How Paddle Lite & PaddleSlim Supercharge Edge AI Inference Performance