Tagged articles
5 articles
Page 1 of 1
AI Explorer
AI Explorer
Mar 3, 2026 · Artificial Intelligence

ByteDance & Tsinghua Reveal AI‑Powered CUDA Agent for Self‑Evolving Kernels

ByteDance and Tsinghua University have created the CUDA Agent, an AI compiler that automatically writes and optimizes GPU kernels, delivering up to double the performance, and heralding a shift where AI‑generated low‑level code could reshape the hardware‑software competition landscape.

AI compilerByteDanceCUDA
0 likes · 6 min read
ByteDance & Tsinghua Reveal AI‑Powered CUDA Agent for Self‑Evolving Kernels
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Sep 3, 2025 · Artificial Intelligence

Understanding AI Compilers: A TVM Example

The article explains how AI compilers transform high‑level models into efficient hardware code, using TVM to illustrate operator optimization, automated scheduling, and end‑to‑end compilation workflow with concrete code examples and performance considerations.

AI compilerDeep LearningTVM
0 likes · 8 min read
Understanding AI Compilers: A TVM Example
DataFunSummit
DataFunSummit
Feb 2, 2025 · Artificial Intelligence

BladeDISC++: A Dynamic‑Shape AI Compiler for Memory‑Peak Optimization in Deep Learning Training

The article introduces BladeDISC++, a dynamic‑shape AI compiler from Alibaba Cloud PAI, explains the memory‑peak challenges of dynamic‑shape deep‑learning workloads, describes its symbolic‑shape graph, joint compile‑time/runtime optimizations such as operation fusion, scheduling and just‑in‑time rematerialization, and presents Llama2 experiments showing significant GPU memory savings and throughput gains.

AI compilerBladeDISCLlama2
0 likes · 15 min read
BladeDISC++: A Dynamic‑Shape AI Compiler for Memory‑Peak Optimization in Deep Learning Training
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 17, 2025 · Artificial Intelligence

How BladeDISC++ Cuts Memory Peaks for Dynamic‑Shape Deep Learning Models

This article explains the challenges of dynamic‑shape deep learning workloads and introduces BladeDISC++, an AI compiler that uses symbolic shape graphs, operation scheduling, and just‑in‑time auto‑rematerialization to dramatically reduce GPU memory peaks while maintaining training throughput.

AI compilerBladeDISC++LLM training
0 likes · 16 min read
How BladeDISC++ Cuts Memory Peaks for Dynamic‑Shape Deep Learning Models
DataFunSummit
DataFunSummit
Sep 8, 2023 · Artificial Intelligence

AI Compiler Forum at DataFun Summit 2023: Tile-Based Deep Learning Compilation, Graph Scheduling for Domain‑Specific Accelerators, and Triton on Hopper

The DataFun Summit 2023 AI Compiler Forum gathered leading researchers to present cutting‑edge techniques on tile‑based deep learning compilation, efficient graph scheduling for domain‑specific accelerators, large‑model deployment, and the latest advancements of OpenAI Triton on NVIDIA Hopper, offering practical insights for AI system developers.

AI compilerGraph SchedulingHardware acceleration
0 likes · 8 min read
AI Compiler Forum at DataFun Summit 2023: Tile-Based Deep Learning Compilation, Graph Scheduling for Domain‑Specific Accelerators, and Triton on Hopper