Tagged articles
5 articles
Page 1 of 1
PaperAgent
PaperAgent
Jan 21, 2026 · Artificial Intelligence

Inside DeepSeek’s FlashMLA Update: What’s New in the MODEL1 Architecture

DeepSeek’s recent FlashMLA update introduces the new MODEL1, featuring a tighter KV-Cache layout, an extra two-stage cache, and a fixed 512×512 head dimension, with four code changes detailed in a public GitHub commit and illustrated by comparative diagrams.

AI ArchitectureDeepSeekFlashMLA
0 likes · 3 min read
Inside DeepSeek’s FlashMLA Update: What’s New in the MODEL1 Architecture
IT Services Circle
IT Services Circle
Feb 27, 2025 · Artificial Intelligence

DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs

DeepSeek’s OpenSourceWeek introduced FlashMLA, a GPU‑optimized MLA decoding kernel for Hopper GPUs that leverages FlashAttention and CUTLASS to dramatically improve large‑model inference performance, with early adoption showing up to 30% higher compute utilization and doubled speed in some scenarios.

DeepSeekFlashMLAGPU
0 likes · 3 min read
DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs
NewBeeNLP
NewBeeNLP
Feb 27, 2025 · Industry Insights

How DeepSeek’s Open‑Source Tools Exploit China‑Specific H800 GPUs to Boost AI Performance

The article analyzes DeepSeek’s three open‑source projects—FlashMLA, DeepEP, and DeepGEMM—showing how they optimize for the China‑only NVIDIA H800 GPU, contrast this with the abundant hardware resources of Western AI firms, and highlight the growing demand for talent that masters both AI models and GPU hardware.

AI hardwareDeepEPDeepGEMM
0 likes · 7 min read
How DeepSeek’s Open‑Source Tools Exploit China‑Specific H800 GPUs to Boost AI Performance
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 25, 2025 · Artificial Intelligence

Accelerate DeepSeek‑V2‑Lite Deployment with FlashMLA: A Step‑by‑Step Guide

This tutorial walks users through installing FlashMLA, integrating it with the vLLM framework, downloading the DeepSeek‑V2‑Lite‑Chat model, benchmarking various MLA implementations, and running a local inference demo that shows FlashMLA’s speed advantage on long‑sequence generation.

DeepSeekFlashMLAInferenceOptimization
0 likes · 16 min read
Accelerate DeepSeek‑V2‑Lite Deployment with FlashMLA: A Step‑by‑Step Guide