Artificial Intelligence 3 min read

DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs

DeepSeek’s OpenSourceWeek introduced FlashMLA, a GPU‑optimized MLA decoding kernel for Hopper GPUs that leverages FlashAttention and CUTLASS to dramatically improve large‑model inference performance, with early adoption showing up to 30% higher compute utilization and doubled speed in some scenarios.

IT Services Circle

Feb 27, 2025

DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs

During the first day of DeepSeek’s OpenSourceWeek, the company announced the release of FlashMLA, an open‑source multi‑layer attention (MLA) decoding kernel specifically tuned for NVIDIA Hopper architecture GPUs such as the H800.

FlashMLA aims to accelerate the decoding phase of large language models by optimizing variable‑length sequence handling, drawing inspiration from FlashAttention 2/3 and NVIDIA’s CUTLASS library.

The project quickly attracted attention, gathering over 3.6k stars on GitHub within a few hours of its launch. The source code is available at https://github.com/deepseek-ai/FlashMLA.

According to DeepSeek, FlashMLA can achieve up to a 30% increase in compute utilization on Hopper GPUs, and in certain workloads it can double the inference speed compared to previous implementations.

Key related resources include the FlashAttention repository ( https://github.com/dao-AILab/flash-attention/) and the CUTLASS library ( https://github.com/nvidia/cutlass), which provide the underlying efficient attention calculations and GPU optimizations.

For a detailed technical walkthrough and usage examples, readers are directed to community posts and discussions, such as the analysis shared by a Zhihu contributor (@TopGeeky).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Artificial Intelligence open‑source DeepSeek GPU MLA FlashMLA Hopper

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.