AutoTuner — 1 Technical Articles

Sep 19, 2023 · Artificial Intelligence

BladeLLM: Ultra‑Long Context LLM Inference via RaggedAttention & AutoTuner

BladeLLM, Alibaba Cloud’s large‑model inference engine, pushes the limits of LLMs by supporting ultra‑long context lengths up to 70 K tokens, leveraging novel RaggedAttention and a DNN‑based AutoTuner to deliver superior performance, memory efficiency, and low‑latency inference across diverse workloads.

AI infrastructureAutoTunerLLM inference

0 likes · 11 min read

BladeLLM: Ultra‑Long Context LLM Inference via RaggedAttention & AutoTuner