Jun 12, 2026 · Artificial Intelligence

MiniMax Open-Source MSA: High‑Performance Attention Kernels Optimized for NVIDIA SM100

MiniMax Sparse Attention (MSA) is an open‑source library that delivers high‑performance dense and block‑sparse attention operators for NVIDIA SM100 GPUs by combining a Jinja‑based csrc JIT stack with a Cutlass Python DSL (CuTe‑DSL), enabling low‑precision quantization, paging support, and seamless migration from dense code.

AI KernelsCuTe-DSLCutlass

0 likes · 5 min read

MiniMax Open-Source MSA: High‑Performance Attention Kernels Optimized for NVIDIA SM100