GPU efficiency — 3 Technical Articles

Nov 20, 2025 · Artificial Intelligence

Boost Multimodal Model Training Efficiency with Offline Sequence Packing and Mixed‑Modality Data

Baidu's Baige team introduces an extended multimodal data loader, automated ShareGPT format conversion, and offline sequence packing techniques that together double token throughput, cut SFT training time by up to six times, and improve GPU utilization and stability for large vision‑language models.

AI infrastructureAIAKGPU efficiency

0 likes · 7 min read

Boost Multimodal Model Training Efficiency with Offline Sequence Packing and Mixed‑Modality Data

Alibaba Cloud Big Data AI Platform

Jul 16, 2025 · Artificial Intelligence

ChunkFlow: Accelerating Long‑Context Model Fine‑Tuning Up to 4.5× Faster

The paper introduces ChunkFlow, an efficient training framework for variable‑length and ultra‑long sequence datasets that powers Qwen models, achieving up to 4.53× speedup over Megatron‑LM and more than 2× overall performance gains by reorganizing data into fixed‑size chunks and employing a state‑aware scheduler.

AI performanceChunkFlowDistributed Training

0 likes · 7 min read

ChunkFlow: Accelerating Long‑Context Model Fine‑Tuning Up to 4.5× Faster

ITPUB

Jun 25, 2021 · Artificial Intelligence

How Alibaba’s Low‑Carbon M6 Model Trains a Trillion‑Parameter AI with 80% Less Energy

Alibaba’s DAMO Academy unveiled the low‑carbon M6 multimodal model, a trillion‑parameter AI trained on just 480 V100 GPUs, achieving over 80% energy reduction and 11‑fold speedup compared to prior trillion‑parameter efforts, and already powering e‑commerce and manufacturing design tools.

GPU efficiencyM6Mixture of Experts

0 likes · 5 min read

How Alibaba’s Low‑Carbon M6 Model Trains a Trillion‑Parameter AI with 80% Less Energy