Tagged articles

QAT

5 articles · Page 1 of 1

Feb 16, 2026 · Artificial Intelligence

A New Extreme Quantization Tool for Large Models: AngelSlim’s 2‑Bit Compression

AngelSlim introduces a full‑stack large‑model compression suite that uses quantization‑aware training to shrink a 1.8B LLM to 2‑bit precision, achieving less than 4% accuracy loss, supporting a wide range of models, speculative decoding, and providing end‑to‑end deployment instructions for MacBook M4 and server environments.

AngelSlimGGUFLarge Language Models

0 likes · 13 min read

A New Extreme Quantization Tool for Large Models: AngelSlim’s 2‑Bit Compression

AI Algorithm Path

Aug 23, 2025 · Artificial Intelligence

Understanding QAT: Quantization‑Aware Training with PyTorch

This article explains the principles of model quantization, compares post‑training quantization (PTQ) and quantization‑aware training (QAT), details the QAT workflow in PyTorch—including fake quantization, gradient handling, and code examples—and offers practical tips for achieving high‑accuracy int8/int4 models.

Fake QuantizationPyTorchQAT

0 likes · 15 min read

Understanding QAT: Quantization‑Aware Training with PyTorch

AI Algorithm Path

Apr 22, 2025 · Artificial Intelligence

Understanding LLM Quantization: GPTQ, QAT, AWQ, GGUF, and GGML Explained

The article walks through the fundamentals of large‑language‑model quantization, presenting a concrete int8 example, detailed explanations of GPTQ, GGUF/GGML, QAT, and AWQ methods, and provides step‑by‑step code snippets, formulas, calibration procedures, and performance observations for each technique.

AWQGGMLGGUF

0 likes · 15 min read

Understanding LLM Quantization: GPTQ, QAT, AWQ, GGUF, and GGML Explained

Meituan Technology Team

Sep 22, 2022 · Artificial Intelligence

Quantization Deployment Scheme for YOLOv6: Methods, Optimizations, and Performance Evaluation

The paper proposes a full quantization pipeline for YOLOv6 that combines a re‑parameterization optimizer, partial PTQ, channel‑wise distillation, graph‑scale merging, and GPU‑offloaded preprocessing, enabling an INT8 model to retain ~42 % mAP while delivering over 200 % throughput increase and 40 % QPS gain versus FP16.

Channel DistillationModel DeploymentPTQ

0 likes · 16 min read

Quantization Deployment Scheme for YOLOv6: Methods, Optimizations, and Performance Evaluation

JD Tech Talk

Jan 6, 2021 · Backend Development

JDDLB Architecture and QAT SSL/TLS Hardware Acceleration Optimization

This article details the overall architecture of JD.com Data Science's JDDLB load balancer, its high‑performance and high‑availability features, and presents a comprehensive performance comparison of SSL/TLS offloading using Intel QAT acceleration cards, including async processing, user‑space driver zero‑copy implementation, crash analysis, and process‑level engine scheduling.

Hardware offloadNGINXPerformance Optimization

0 likes · 13 min read

JDDLB Architecture and QAT SSL/TLS Hardware Acceleration Optimization