Shi's AI Notebook
Mar 16, 2026 · Artificial Intelligence
What Attention Actually Does in MiniMind: Tracing Q/K/V, Shape Changes, and Context Fusion
This article walks through MiniMind's Attention.forward implementation, explaining why Q, K, and V are created, how tensors are reshaped for multi‑head attention, the role of masks, KV cache, GQA, and how each token aggregates information from the entire context.
attentiondeep learninggqa
0 likes · 21 min read
