Artificial Intelligence 3 min read

DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs

DeepSeek’s OpenSourceWeek introduced FlashMLA, a GPU‑optimized MLA decoding kernel for Hopper GPUs that leverages FlashAttention and CUTLASS to dramatically improve large‑model inference performance, with early adoption showing up to 30% higher compute utilization and doubled speed in some scenarios.

IT Services Circle
IT Services Circle
IT Services Circle
DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs

During the first day of DeepSeek’s OpenSourceWeek, the company announced the release of FlashMLA, an open‑source multi‑layer attention (MLA) decoding kernel specifically tuned for NVIDIA Hopper architecture GPUs such as the H800.

FlashMLA aims to accelerate the decoding phase of large language models by optimizing variable‑length sequence handling, drawing inspiration from FlashAttention 2/3 and NVIDIA’s CUTLASS library.

The project quickly attracted attention, gathering over 3.6k stars on GitHub within a few hours of its launch. The source code is available at https://github.com/deepseek-ai/FlashMLA .

According to DeepSeek, FlashMLA can achieve up to a 30% increase in compute utilization on Hopper GPUs, and in certain workloads it can double the inference speed compared to previous implementations.

Key related resources include the FlashAttention repository ( https://github.com/dao-AILab/flash-attention/ ) and the CUTLASS library ( https://github.com/nvidia/cutlass ), which provide the underlying efficient attention calculations and GPU optimizations.

For a detailed technical walkthrough and usage examples, readers are directed to community posts and discussions, such as the analysis shared by a Zhihu contributor (@TopGeeky).

Artificial IntelligenceOpen-sourceDeepSeekGPUMLAFlashMLAHopper
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.