Machine Heart
Jun 19, 2026 · Artificial Intelligence
GoLongRL Open‑Source: 23K Samples, 9 Task Types, and the End of the Long‑Context RL Desert
GoLongRL introduces a fully open‑source long‑context reinforcement‑learning pipeline with a 23K‑sample RLVR dataset covering nine capability‑oriented tasks, a TMN‑Reweight optimizer for heterogeneous multitask training, and demonstrates SOTA performance on 4B and 30B models, surpassing leading baselines.
GoLongRLSOTA evaluationTMN-Reweight
0 likes · 13 min read
