Architecture Digest
Feb 24, 2025 · Artificial Intelligence
MoBA: Mixture of Block Attention for Long‑Context Large Language Models
The article introduces MoBA, a Mixture‑of‑Block‑Attention mechanism that applies Mixture‑of‑Experts principles to transformer attention, enabling efficient long‑context processing for large language models while maintaining performance comparable to full attention through sparse, trainable block selection and seamless switching.
LLMMixture of ExpertsMoBA
0 likes · 12 min read