Tagged articles

warp scheduling

2 articles · Page 1 of 1

Dec 7, 2025 · Fundamentals

CUDA Optimization Basics: Understanding GPU Architecture and Warp Scheduling

This article explains the fundamentals of CUDA performance tuning, covering GPU architectures from Kepler to Volta, the role of SMX, warp schedulers, registers and memory hierarchies, and provides practical guidance on launch configuration, latency hiding, and thread‑block sizing to maximize throughput.

CUDAGPU architecturememory latency

0 likes · 21 min read

CUDA Optimization Basics: Understanding GPU Architecture and Warp Scheduling

Architects' Tech Alliance

Aug 21, 2024 · Fundamentals

Inside NVIDIA’s Stream Multiprocessor: How GPUs Execute Parallel Workloads

This article provides a detailed technical overview of the Stream Multi‑processor (SM) in modern GPUs, explaining its micro‑architecture, instruction fetch‑decode pipeline, warp scheduling, SIMT stack handling, scoreboard mechanisms, and strategies for hiding memory latency to maximize parallel execution efficiency.

GPUSIMTScoreboard

0 likes · 17 min read

Inside NVIDIA’s Stream Multiprocessor: How GPUs Execute Parallel Workloads