Architects' Tech Alliance
Aug 21, 2024 · Fundamentals
Inside NVIDIA’s Stream Multiprocessor: How GPUs Execute Parallel Workloads
This article provides a detailed technical overview of the Stream Multi‑processor (SM) in modern GPUs, explaining its micro‑architecture, instruction fetch‑decode pipeline, warp scheduling, SIMT stack handling, scoreboard mechanisms, and strategies for hiding memory latency to maximize parallel execution efficiency.
GPUSIMTScoreboard
0 likes · 17 min read
