Jun 19, 2026 · Artificial Intelligence

GoLongRL Open‑Source: 23K Samples, 9 Task Types, and the End of the Long‑Context RL Desert

GoLongRL introduces a fully open‑source long‑context reinforcement‑learning pipeline with a 23K‑sample RLVR dataset covering nine capability‑oriented tasks, a TMN‑Reweight optimizer for heterogeneous multitask training, and demonstrates SOTA performance on 4B and 30B models, surpassing leading baselines.

GoLongRLSOTA evaluationTMN-Reweight

0 likes · 13 min read

GoLongRL Open‑Source: 23K Samples, 9 Task Types, and the End of the Long‑Context RL Desert