Baobao Algorithm Notes
Jul 11, 2024 · Artificial Intelligence
Why Separate Prefill and Decode? A Deep Dive into DistServe’s Split Inference Architecture
This article explores the two‑stage LLM inference pipeline, introduces TTFT and TPOT metrics, explains the motivation for prefilling‑decoding separation, presents experimental comparisons between split and merged architectures, and details optimization techniques and parallel‑strategy modeling for DistServe.
DistServeGoodputLLM inference
0 likes · 28 min read
