BestHub
Discover
Artificial IntelligenceBackend DevelopmentMobile DevelopmentProduct ManagementCloud NativeFrontend DevelopmentFundamentalsBig DataCloud ComputingGame DevelopmentR&D ManagementOperationsDatabasesInformation SecurityBlockchainUser Experience DesignInterview ExperienceIndustry Insights
View all →
TopicsTagsTrendsRanking
Sign in
Discover
Artificial Intelligence Backend Development Mobile Development Product Management Cloud Native Frontend Development Fundamentals Big Data Cloud Computing Game Development R&D Management Operations Databases Information Security Blockchain User Experience Design Interview Experience Industry Insights View all →
TopicsTagsTrendsRanking
Sign in
  1. Home
  2. / Tags
  3. / qkv
Shi's AI Notebook
Shi's AI Notebook
Mar 16, 2026 · Artificial Intelligence

What Attention Actually Does in MiniMind: Tracing Q/K/V, Shape Changes, and Context Fusion

This article walks through MiniMind's Attention.forward implementation, explaining why Q, K, and V are created, how tensors are reshaped for multi‑head attention, the role of masks, KV cache, GQA, and how each token aggregates information from the entire context.

attentiondeep learninggqa
0 likes · 21 min read
What Attention Actually Does in MiniMind: Tracing Q/K/V, Shape Changes, and Context Fusion
BestHub

Editorial precision for engineers who prefer signal over noise. Deep reads, careful curation, and sharper frontiers in software.

Best Hub for Dev. Power Your Build.
Navigation
Discover Tags Topics System Status Privacy Terms Rss Feed