AI Algorithm Path
AI Algorithm Path
Feb 24, 2025 · Artificial Intelligence

Flash-MLA: Boosting LLM Inference Speed on Nvidia Hopper GPUs

Flash-MLA is an open‑source GPU kernel optimized for Nvidia Hopper GPUs that compresses the KV cache of multi‑head attention, cutting memory usage by up to 93.3% and delivering 580 TFLOPS compute, thereby dramatically accelerating large‑language‑model inference while lowering cost.

DeepSeekFlash-MLAGPU Optimization
0 likes · 8 min read
Flash-MLA: Boosting LLM Inference Speed on Nvidia Hopper GPUs