Tagged articles
2 articles
Page 1 of 1
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jun 9, 2025 · Artificial Intelligence

How to Build High‑Performance GEMM with NVIDIA CUTLASS

The article explains why standard GEMM libraries may fall short for special matrix shapes, introduces NVIDIA’s open‑source CUTLASS library, details its hierarchical tiling architecture, and walks through a complete device‑API example that customizes tile sizes and data layouts to achieve near‑hand‑written kernel performance on modern GPUs.

CUDACUTLASSGEMM
0 likes · 6 min read
How to Build High‑Performance GEMM with NVIDIA CUTLASS
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 22, 2023 · Artificial Intelligence

CUTLASS Extreme Performance Optimization and Its Application in Alibaba's Recommendation System

At the GTC conference, the talk presents Alibaba Cloud’s heterogeneous computing platform and introduces the Open Deep Learning API (ODLA), then details how CUTLASS‑based operator fusion dramatically accelerates attention and MLP layers in large‑scale recommendation models, achieving multi‑fold performance gains in production.

CUTLASSDeep LearningGPU computing
0 likes · 5 min read
CUTLASS Extreme Performance Optimization and Its Application in Alibaba's Recommendation System