Tagged articles

AI compiler

5 articles · Page 1 of 1

Mar 3, 2026 · Artificial Intelligence

ByteDance & Tsinghua Reveal AI‑Powered CUDA Agent for Self‑Evolving Kernels

ByteDance and Tsinghua University have created the CUDA Agent, an AI compiler that automatically writes and optimizes GPU kernels, delivering up to double the performance, and heralding a shift where AI‑generated low‑level code could reshape the hardware‑software competition landscape.

AI compilerByteDanceCUDA

0 likes · 6 min read

ByteDance & Tsinghua Reveal AI‑Powered CUDA Agent for Self‑Evolving Kernels

Network Intelligence Research Center (NIRC)

Sep 3, 2025 · Artificial Intelligence

Understanding AI Compilers: A TVM Example

The article explains how AI compilers transform high‑level models into efficient hardware code, using TVM to illustrate operator optimization, automated scheduling, and end‑to‑end compilation workflow with concrete code examples and performance considerations.

AI compilerDeep LearningTVM

0 likes · 8 min read

Understanding AI Compilers: A TVM Example

DataFunSummit

Feb 2, 2025 · Artificial Intelligence

BladeDISC++: A Dynamic‑Shape AI Compiler for Memory‑Peak Optimization in Deep Learning Training

The article introduces BladeDISC++, a dynamic‑shape AI compiler from Alibaba Cloud PAI, explains the memory‑peak challenges of dynamic‑shape deep‑learning workloads, describes its symbolic‑shape graph, joint compile‑time/runtime optimizations such as operation fusion, scheduling and just‑in‑time rematerialization, and presents Llama2 experiments showing significant GPU memory savings and throughput gains.

AI compilerBladeDISCLlama2

0 likes · 15 min read

BladeDISC++: A Dynamic‑Shape AI Compiler for Memory‑Peak Optimization in Deep Learning Training

Alibaba Cloud Big Data AI Platform

Jan 17, 2025 · Artificial Intelligence

How BladeDISC++ Cuts Memory Peaks for Dynamic‑Shape Deep Learning Models

This article explains the challenges of dynamic‑shape deep learning workloads and introduces BladeDISC++, an AI compiler that uses symbolic shape graphs, operation scheduling, and just‑in‑time auto‑rematerialization to dramatically reduce GPU memory peaks while maintaining training throughput.

AI compilerBladeDISC++LLM training

0 likes · 16 min read

How BladeDISC++ Cuts Memory Peaks for Dynamic‑Shape Deep Learning Models

DataFunSummit

Sep 8, 2023 · Artificial Intelligence

AI Compiler Forum at DataFun Summit 2023: Tile-Based Deep Learning Compilation, Graph Scheduling for Domain‑Specific Accelerators, and Triton on Hopper

The DataFun Summit 2023 AI Compiler Forum gathered leading researchers to present cutting‑edge techniques on tile‑based deep learning compilation, efficient graph scheduling for domain‑specific accelerators, large‑model deployment, and the latest advancements of OpenAI Triton on NVIDIA Hopper, offering practical insights for AI system developers.

AI compilerGraph SchedulingLarge Model Deployment

0 likes · 8 min read

AI Compiler Forum at DataFun Summit 2023: Tile-Based Deep Learning Compilation, Graph Scheduling for Domain‑Specific Accelerators, and Triton on Hopper