Boost Multimodal Model Training Efficiency with Offline Sequence Packing and Mixed‑Modality Data
Baidu's Baige team introduces an extended multimodal data loader, automated ShareGPT format conversion, and offline sequence packing techniques that together double token throughput, cut SFT training time by up to six times, and improve GPU utilization and stability for large vision‑language models.
