Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 8, 2025 · Fundamentals

How to Profile GPU Kernels with PTX Probes: From CUDA Basics to Custom Instrumentation

This article walks through GPU performance analysis, starting with CUDA architecture fundamentals, demonstrating matrix multiplication optimization, explaining PTX assembly, and introducing the Neutrino framework for programmable GPU probes that enable fine‑grained, custom instrumentation and detailed timing measurements of kernel execution.

CUDAGPUNeutrino
0 likes · 45 min read
How to Profile GPU Kernels with PTX Probes: From CUDA Basics to Custom Instrumentation