Alibaba Cloud Big Data AI Platform
Sep 19, 2023 · Artificial Intelligence
BladeLLM: Ultra‑Long Context LLM Inference via RaggedAttention & AutoTuner
BladeLLM, Alibaba Cloud’s large‑model inference engine, pushes the limits of LLMs by supporting ultra‑long context lengths up to 70 K tokens, leveraging novel RaggedAttention and a DNN‑based AutoTuner to deliver superior performance, memory efficiency, and low‑latency inference across diverse workloads.
AI infrastructureAutoTunerLLM inference
0 likes · 11 min read
