Tag

INT8 quantization

0 views collected around this technical thread.

Architecture & Thinking
Architecture & Thinking
Jun 30, 2023 · Artificial Intelligence

How INT8 Quantization Supercharges Baidu's Search Models: Techniques and Insights

This article explores the rapid evolution of Baidu's semantic search models, the large GPU consumption they entail, and how extensive INT8 quantization, sensitivity analysis, calibration data augmentation, hyper‑parameter auto‑tuning, and advanced methods like Quantization‑Aware Training and SmoothQuant dramatically improve inference performance while preserving business metrics.

ERNIEINT8 quantizationPerformance Optimization
0 likes · 17 min read
How INT8 Quantization Supercharges Baidu's Search Models: Techniques and Insights
Baidu Geek Talk
Baidu Geek Talk
Jun 26, 2023 · Artificial Intelligence

INT8 Quantization for Baidu Search Semantic Models (ERNIE)

Baidu applied large‑scale INT8 quantization to its ERNIE search semantic models, achieving over 25% inference speedup with less than 1% degradation in relevance metrics by selectively quantizing less‑sensitive fully‑connected layers, using automated calibration, hyper‑parameter tuning, and techniques such as QAT and SmoothQuant, while paving the way for even lower‑bit quantization and token pruning.

ERNIEINT8 quantizationPerformance Optimization
0 likes · 15 min read
INT8 Quantization for Baidu Search Semantic Models (ERNIE)
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 5, 2021 · Artificial Intelligence

Accelerating 4K Video Super‑Resolution with TensorRT: iQIYI’s Optimization and Production Practices

iQIYI optimized a 4K video super-resolution model using TensorRT, employing split of graph, operator fusion, custom CUDA kernels, and int8 quantization, achieving tenfold speedup (≈180 ms per 1080p frame) and demonstrating deep customization potential for large‑scale production.

INT8 quantizationTensorRTdeep learning
0 likes · 17 min read
Accelerating 4K Video Super‑Resolution with TensorRT: iQIYI’s Optimization and Production Practices