GQA — 3 Technical Articles

Machine Learning Algorithms & Natural Language Processing

Mar 24, 2026 · Artificial Intelligence

A Comprehensive Guide to Major Attention Mechanisms: From MHA and GQA to MLA, Sparse and Hybrid Architectures

This article reviews and compares the most important attention variants used in modern large language models—including multi‑head attention, grouped‑query attention, multi‑head latent attention, sparse and sliding‑window attention, gated attention, and hybrid designs—detailing their motivations, memory trade‑offs, example architectures, and experimental findings.

Attention MechanismsGQALLM

0 likes · 29 min read

A Comprehensive Guide to Major Attention Mechanisms: From MHA and GQA to MLA, Sparse and Hybrid Architectures

Baobao Algorithm Notes

Feb 17, 2025 · Artificial Intelligence

Can TransMLA Turn GQA into a More Powerful MLA? A Deep Dive into DeepSeek Models

This article presents a theoretical and experimental analysis of converting Group Query Attention (GQA) models to Multi‑Head Linear Attention (MLA) using the TransMLA method, demonstrating superior expressiveness and performance on DeepSeek‑based large language models while keeping KV‑Cache costs unchanged.

AttentionDeepSeekGQA

0 likes · 11 min read

Can TransMLA Turn GQA into a More Powerful MLA? A Deep Dive into DeepSeek Models

Baobao Algorithm Notes

Sep 28, 2024 · Artificial Intelligence

Inside Llama 3: A Complete Guide to Modern LLM Training, Architecture, and Optimization

This article provides a thorough, yet concise, overview of Llama 3’s training pipeline, data handling, model architecture, scaling laws, post‑training techniques like SFT and DPO, and inference optimizations such as KV‑Cache, GQA, PagedAttention, and FP8 quantization, highlighting practical insights and benchmark results.

DPOGQAInference

0 likes · 32 min read

Inside Llama 3: A Complete Guide to Modern LLM Training, Architecture, and Optimization

A Comprehensive Guide to Major Attention Mechanisms: From MHA and GQA to MLA, Sparse and Hybrid Architectures

Can TransMLA Turn GQA into a More Powerful MLA? A Deep Dive into DeepSeek Models

Inside Llama 3: A Complete Guide to Modern LLM Training, Architecture, and Optimization

Inside Llama 3: A Complete Guide to Modern LLM Training, Architecture, and Optimization