Tag

speculative-sampling

1 views collected around this technical thread.

DataFunSummit
DataFunSummit
Nov 4, 2024 · Artificial Intelligence

Performance Optimization Techniques for Large Model Inference Frameworks

This article outlines four key optimization areas for large model inference frameworks—quantization, speculative sampling, TTFT/TPOT improvements, and communication optimization—detailing specific techniques, experimental results, and practical benefits such as reduced memory usage, lower latency, and higher throughput.

AIPerformanceinference optimization
0 likes · 12 min read
Performance Optimization Techniques for Large Model Inference Frameworks
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Oct 11, 2024 · Artificial Intelligence

Harmonized Speculative Sampling (HASS): Aligning Training and Decoding for Efficient Large Language Model Inference

HASS aligns training and decoding contexts and objectives for speculative sampling, using harmonized objective distillation and multi-step context alignment, achieving 2.81–4.05× speedup and 8%–20% improvement over EAGLE‑2 while preserving generation quality in real-world deployments at Xiaohongshu.

AIHASSInference Acceleration
0 likes · 11 min read
Harmonized Speculative Sampling (HASS): Aligning Training and Decoding for Efficient Large Language Model Inference