Alibaba Cloud Developer
Sep 8, 2025 · Fundamentals
How to Profile GPU Kernels with PTX Probes: From CUDA Basics to Custom Instrumentation
This article walks through GPU performance analysis, starting with CUDA architecture fundamentals, demonstrating matrix multiplication optimization, explaining PTX assembly, and introducing the Neutrino framework for programmable GPU probes that enable fine‑grained, custom instrumentation and detailed timing measurements of kernel execution.
CUDAGPUNeutrino
0 likes · 45 min read
