A Comprehensive Guide to Major Attention Mechanisms: From MHA and GQA to MLA, Sparse and Hybrid Architectures
This article reviews and compares the most important attention variants used in modern large language models—including multi‑head attention, grouped‑query attention, multi‑head latent attention, sparse and sliding‑window attention, gated attention, and hybrid designs—detailing their motivations, memory trade‑offs, example architectures, and experimental findings.
