Artificial Intelligence 5 min read

AReaL‑boba: Open‑Source Reinforcement Learning Training Framework v0.2 with SOTA Performance

The Ant Research Institute and Tsinghua University's Wu Yi team released AReaL‑boba 0.2, an open‑source reinforcement‑learning training framework that dramatically speeds up large‑scale model training, achieves state‑of‑the‑art mathematical reasoning results, and provides all code, data, and scripts for reproducible research.

AntTech
AntTech
AntTech
AReaL‑boba: Open‑Source Reinforcement Learning Training Framework v0.2 with SOTA Performance

The Ant Research Institute and Tsinghua University's Wu Yi team jointly released the open‑source reinforcement‑learning training framework AReaL‑boba 0.2, making the full code, data, and training scripts publicly available to facilitate easy reproduction of SOTA inference models.

Key Highlights

Training Speed and Efficiency Breakthrough

Ultra‑fast training throughput: integrating the SGLang framework used by xAI, the system improves training speed by 35%/60%/73% for 1.5B, 7B, and 32B models respectively.

Massive distributed support: 128 H800 GPUs can train a 1.5B model in one day, and 256 H800 GPUs can train a 7B model in two days.

Mathematical Reasoning Performance (SOTA)

7B model sets a new open‑source community record: using Qwen‑R1‑Distill‑7B as the base, large‑scale RL training achieves the best domain‑specific math reasoning scores—61.9 on AIME 2024 and 48.3 on AIME 2025.

Full‑Process Open Verification

All training data (AReaL‑boba‑106k), training scripts, and evaluation scripts are released to ensure reproducibility.

Low‑Cost Replication of Large‑Model Effects

By data distillation, a 32B model (Qwen‑32B‑Distill) is fine‑tuned with only 200 data points, achieving 78.8 AIME 2024 score (close to QwQ‑32B’s 78.9) at a cost of just $200. The table below compares scores:

Model

AIME 2024

R1‑Distill‑Qwen‑32B

72.6

QwQ‑32B

78.9

AReaL‑boba‑SFT‑32B

78.8

Open‑Source Commitment

No‑restriction release: framework code, training data (including the full 106k dataset and 200 distilled samples), model weights, and documentation are all open‑source.

Community‑driven: PPO hyper‑parameters, reward function design, regularization strategies, and plans for asynchronous training and dataset upgrades are publicly shared.

Get Started

GitHub repository: https://github.com/inclusionAI/AReaL

Hugging Face collection: https://huggingface.co/collections/inclusionAI/areal-boba-67e9f3fa5aeb74b76dcf5f0a

Technical details: https://github.com/inclusionAI/AReaL/blob/main/blog/AReaL_v0_2.md

Training data: https://huggingface.co/datasets/inclusionAI/AReaL-boba-Data/blob/main/AReaL-boba-106k.jsonl

The AReaL team aims to democratize reinforcement‑learning technology, hoping the framework becomes as commonplace as a daily beverage for AI developers, enabling the community to explore the limitless possibilities of intelligent systems.

performanceAILarge Modelstraining framework
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.