Xiaohongshu Tech REDtech
Jan 2, 2025 · Artificial Intelligence
Xiaohongshu's Self-developed RLHF System for Multimodal Large Language Models: Design, Optimization, and Performance
Xiaohongshu’s team unveiled a self‑developed RLHF system that trains multimodal large language models using heterogeneous and homogeneous network architectures, extensive PPO optimizations, and Medusa speculative sampling, achieving over 50% throughput gains, reduced hardware needs, and 5‑20% performance improvements on zero‑shot benchmarks.
MedusaPPOPRM
0 likes · 21 min read