Tag

Quantization Aware Training

0 views collected around this technical thread.

Baidu Geek Talk
Baidu Geek Talk
Jun 26, 2023 · Artificial Intelligence

INT8 Quantization for Baidu Search Semantic Models (ERNIE)

Baidu applied large‑scale INT8 quantization to its ERNIE search semantic models, achieving over 25% inference speedup with less than 1% degradation in relevance metrics by selectively quantizing less‑sensitive fully‑connected layers, using automated calibration, hyper‑parameter tuning, and techniques such as QAT and SmoothQuant, while paving the way for even lower‑bit quantization and token pruning.

ERNIEINT8 quantizationPerformance Optimization
0 likes · 15 min read
INT8 Quantization for Baidu Search Semantic Models (ERNIE)