Multimodal Training — 6 Technical Articles

Apr 24, 2026 · Artificial Intelligence

LoongForge: Open‑Source Multimodal Training Framework Runs on GPU and Kunlun XPU with 45% Speedup

LoongForge is an open‑source, Megatron‑based multimodal training framework that unifies LLM, VLM, VLA and diffusion models, runs seamlessly on NVIDIA GPUs and Baidu Kunlun XPU, and delivers 15%‑45% end‑to‑end training acceleration while scaling linearly on thousands of cards.

GPUKunlun XPULoongForge

0 likes · 23 min read

LoongForge: Open‑Source Multimodal Training Framework Runs on GPU and Kunlun XPU with 45% Speedup

Baidu Intelligent Cloud Tech Hub

Nov 20, 2025 · Artificial Intelligence

Boost Multimodal Model Training Efficiency with Offline Sequence Packing and Mixed‑Modality Data

Baidu's Baige team introduces an extended multimodal data loader, automated ShareGPT format conversion, and offline sequence packing techniques that together double token throughput, cut SFT training time by up to six times, and improve GPU utilization and stability for large vision‑language models.

AI infrastructureAIAKGPU efficiency

0 likes · 7 min read

Boost Multimodal Model Training Efficiency with Offline Sequence Packing and Mixed‑Modality Data

Baidu Intelligent Cloud Tech Hub

Nov 4, 2025 · Artificial Intelligence

How Baidu’s Baige Accelerates Multimodal Video Training with Context Parallelism

Baidu Baige’s enhanced veRL framework dramatically boosts video frame rates and resolution limits, cuts training time, reduces memory usage, and improves model accuracy by leveraging context parallelism and optimized attention on Ampere GPUs for multimodal mixed‑training scenarios.

AI accelerationContext ParallelismMultimodal Training

0 likes · 6 min read

How Baidu’s Baige Accelerates Multimodal Video Training with Context Parallelism

HyperAI Super Neural

Oct 27, 2025 · Artificial Intelligence

MIT’s Open‑Source BoltzGen Achieves nM‑Level Affinity for 66% of Targets Across Molecular Types

BoltzGen, an all‑atom generative model released by MIT and collaborators, unifies protein folding and binder design with a geometric continuous representation and a flexible design language, training on multimodal datasets and demonstrating nM‑level affinity for 66% of 26 diverse targets including proteins, nanobodies, peptides and small molecules.

BoltzGenGenerative AIMultimodal Training

0 likes · 12 min read

MIT’s Open‑Source BoltzGen Achieves nM‑Level Affinity for 66% of Targets Across Molecular Types

Amap Tech

Oct 3, 2025 · Artificial Intelligence

How OmniNav Unifies Multi‑Task Embodied Navigation with a Fast‑Slow Dual System

OmniNav introduces a unified framework for embodied navigation that simultaneously handles instruction‑goal, object‑goal, point‑goal, and frontier‑based exploration tasks using a fast visual‑language‑driven policy and a slow memory‑augmented planner, achieving state‑of‑the‑art performance and real‑world 5 Hz deployment.

Multimodal TrainingVision Language Modelcontinuous control

0 likes · 9 min read

How OmniNav Unifies Multi‑Task Embodied Navigation with a Fast‑Slow Dual System

AI Algorithm Path

Aug 3, 2025 · Artificial Intelligence

Inside Meta’s PerceptionLM: A Deep Dive into Open‑Source Vision‑Language Models

The article provides a detailed analysis of Meta’s PerceptionLM, an open‑source perception language model built on Llama 3, describing its vision encoder, projector, dynamic tiling, three‑stage training pipeline, model variants, and competitive performance on image and video benchmarks.

Dynamic TilingLlama3Multimodal Training

0 likes · 10 min read

Inside Meta’s PerceptionLM: A Deep Dive into Open‑Source Vision‑Language Models