Tagged articles
3 articles
Page 1 of 1
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Sep 30, 2025 · Artificial Intelligence

Boost AI Model Performance: Master Host‑Device Scheduling on Ascend Platforms

This article explains how CPUs and Ascend AI processors cooperate as host and device, compares sink and host scheduling modes, defines Host‑Bound and Device‑Bound models, and presents optimization techniques such as tiling cache, multi‑core concurrency, and small‑shape operator handling that dramatically improve AI model execution efficiency.

AIModel Schedulingdynamic shape
0 likes · 12 min read
Boost AI Model Performance: Master Host‑Device Scheduling on Ascend Platforms
DataFunSummit
DataFunSummit
Feb 2, 2025 · Artificial Intelligence

BladeDISC++: A Dynamic‑Shape AI Compiler for Memory‑Peak Optimization in Deep Learning Training

The article introduces BladeDISC++, a dynamic‑shape AI compiler from Alibaba Cloud PAI, explains the memory‑peak challenges of dynamic‑shape deep‑learning workloads, describes its symbolic‑shape graph, joint compile‑time/runtime optimizations such as operation fusion, scheduling and just‑in‑time rematerialization, and presents Llama2 experiments showing significant GPU memory savings and throughput gains.

AI compilerBladeDISCLlama2
0 likes · 15 min read
BladeDISC++: A Dynamic‑Shape AI Compiler for Memory‑Peak Optimization in Deep Learning Training
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 17, 2025 · Artificial Intelligence

How BladeDISC++ Cuts Memory Peaks for Dynamic‑Shape Deep Learning Models

This article explains the challenges of dynamic‑shape deep learning workloads and introduces BladeDISC++, an AI compiler that uses symbolic shape graphs, operation scheduling, and just‑in‑time auto‑rematerialization to dramatically reduce GPU memory peaks while maintaining training throughput.

AI compilerBladeDISC++LLM training
0 likes · 16 min read
How BladeDISC++ Cuts Memory Peaks for Dynamic‑Shape Deep Learning Models