Tagged articles
4 articles
Page 1 of 1
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jun 2, 2026 · Artificial Intelligence

Halving Training Time: LoongForge Full‑Stack Optimizations Boost GR00T N1.6 Throughput 2.3×

LoongForge applies system‑level optimizations—async data prefetch, fine‑grained communication‑compute overlap via a Megatron distributed optimizer, and per‑microbatch CUDA Graph scheduling—to the GR00T N1.6 Vision‑Language‑Action model, delivering up to 2.3× higher training throughput and a 56.6% reduction in overall training time on an 8×A800 cluster.

CUDA GraphDistributed TrainingGR00T N1.6
0 likes · 14 min read
Halving Training Time: LoongForge Full‑Stack Optimizations Boost GR00T N1.6 Throughput 2.3×
Baidu Geek Talk
Baidu Geek Talk
May 25, 2026 · Artificial Intelligence

Accelerating Multimodal Model Training: LoongForge's DP Load‑Balancing Optimization Explained

The article analyzes how data‑parallel (DP) load imbalance hampers large‑scale multimodal model training, details LoongForge's two‑stage adaptive data‑reallocation method that builds a precise compute‑cost model and dynamically redistributes samples, and presents experimental results showing up to 10% throughput gains on massive DP clusters.

DP load balancingData ParallelDistributed Training
0 likes · 16 min read
Accelerating Multimodal Model Training: LoongForge's DP Load‑Balancing Optimization Explained
Baidu Geek Talk
Baidu Geek Talk
May 13, 2026 · Artificial Intelligence

LoongForge Boosts Multimodal Training Speed by 45% on GPU and Kunlun XPU

LoongForge, Baidu Baige’s open‑source full‑modal training framework, unifies LLM, VLM and VLA workloads, runs unchanged on NVIDIA GPUs and Kunlun XPU, and delivers 15‑45% end‑to‑end speedups with up to 90% linear scaling on 5,000‑plus card clusters, while simplifying model integration via YAML.

AI InfrastructureGPUKunlun XPU
0 likes · 23 min read
LoongForge Boosts Multimodal Training Speed by 45% on GPU and Kunlun XPU
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Apr 24, 2026 · Artificial Intelligence

LoongForge: Open‑Source Multimodal Training Framework Runs on GPU and Kunlun XPU with 45% Speedup

LoongForge is an open‑source, Megatron‑based multimodal training framework that unifies LLM, VLM, VLA and diffusion models, runs seamlessly on NVIDIA GPUs and Baidu Kunlun XPU, and delivers 15%‑45% end‑to‑end training acceleration while scaling linearly on thousands of cards.

GPUKunlun XPULoongForge
0 likes · 23 min read
LoongForge: Open‑Source Multimodal Training Framework Runs on GPU and Kunlun XPU with 45% Speedup