Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Aug 20, 2025 · Artificial Intelligence

Mastering Large‑Model Interview Questions: MHA, KV‑Cache, Scaled Dot‑Product, and Speculative Decoding

This guide walks through common large‑model interview challenges, including a hands‑on implementation of multi‑head attention with KV‑cache, the mathematical reason for scaling by sqrt(dₖ), a concise speculative decoding algorithm, and systematic debugging steps for NaN loss during training.

KV cacheLarge Model InterviewMulti‑Head Attention
0 likes · 14 min read
Mastering Large‑Model Interview Questions: MHA, KV‑Cache, Scaled Dot‑Product, and Speculative Decoding