What’s New in PyTorch 2.10? Deep Dive into GPU and CUDA Enhancements

PyTorch 2.10 introduces extensive upgrades for AMD ROCm, Intel XPU, and NVIDIA CUDA, adds new Torch XPU APIs, expands Python 3.14 support, and brings performance‑focused improvements such as fused kernels and enhanced quantization, all available via the official GitHub release.

21CTO
21CTO
21CTO
What’s New in PyTorch 2.10? Deep Dive into GPU and CUDA Enhancements

PyTorch 2.10 was released, bringing the latest feature set to the widely used deep‑learning framework.

The AMD ROCm build now supports regular GEMM fallback and CK‑based grouped GEMM, adds torch.cuda._compile_kernel and load_inline support on Windows, and includes GFX1150/GFX1151 RDNA 3.5 GPUs in the hipblaslt GEMM list. It also adds scaled_mm v2, AOTriton scaled_dot_product_attention, heuristic improvements for pointwise kernels, code‑generation support for fast_tanhf, and other ROCm‑specific enhancements.

Intel GPU support receives several upgrades, including new Torch XPU APIs, ATen operators scaled_mm and scaled_mm_v2, the _weight_int8pack_mm operator, and SYCL support in the PyTorch C++ extension API that enables custom operators on Windows. Additional performance tweaks for Intel hardware are also included.

On the NVIDIA side, CUDA support is expanded with template kernels, pre‑compiled kernel capabilities, automatic inclusion of CUDA headers, the cuda‑python stream protocol, improved compatibility with CUDA 13, nested memory pool support, and CUTLASS‑based MATMULs on Thor.

Other notable changes include torch.compile gaining Python 3.14 compatibility and experimental support for the free‑threaded Python 3.14 runtime. Torch Inductor’s combo‑kernels now perform horizontal fusion to reduce kernel launch overhead, debugging features are enhanced, and quantization functionality is strengthened.

You can download PyTorch 2.10 and view the full release notes at the official GitHub page.

https://github.com/pytorch/pytorch/releases/tag/v2.10.0
Blog – PyTorch
Blog – PyTorch
PyTorch 2.10 release with AMD ROCm and Intel GPU improvements
PyTorch 2.10 release with AMD ROCm and Intel GPU improvements
deep learningCUDAGPUreleasePyTorchROCm
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.